SAMOS Core Module 3.2 & 3.4
R Workshop (3.2) and Statistics in Ocean Sciences (3.4)
|
|
1 About SAMOS
The South African Master of Ocean Sciences (SAMOS / AOS‑SAMOS) is a joint Master’s programme organised by multiple South African and European partner universities, supported by the EU ERASMUS+ Capacity Building in Higher Education (CBHE) programme. The programme builds on the strengths of the University of Cape Town’s Applied Ocean Sciences (AOS) Master’s, and aims to train multidisciplinary ocean professionals for the evolving needs of the blue bio‑economy and for research supporting the sustainable use and conservation of marine ecosystems.
In 2026, the AOS programme serves as the pilot course for SAMOS, with teaching contributions from the SAMOS partners. The coursework component is largely delivered at the Ocean Sciences Campus of Nelson Mandela University (Gqeberha) over the February–June period.
Sources:
- https://maris.uct.ac.za/aos-samos-masters-course
- https://maris.uct.ac.za/aos-course-structure
- https://samos-edu.eu/
2 Venue, Timetable, and Content
The venue for the module is the 5th Floor Computer Lab, BCB Department, University of the Western Cape.
- Introduction to R runs from 23 to 27 February 2026 (starting the afternoon of 23 February).
- Statistics runs from 20 to 24 April 2026.
Lectures will run from 09:00 to 16:30 on the days indicated in the table below.
The module coordinator and lecturer is Prof AJ Smit (UWC, BCB, Room 4.103), and the teaching assistant for the module is Mr. Jesse Phillips (4115146@myuwc.ac.za). For queries about the SAMOS programme in general, please consult Dr. Denise Schael (Ocean Sciences Campus, Office 0210, 2nd Floor, E-Block Extension, Building 1105).
- Introduction to R: 23–27 February 2026 (starting the afternoon of 23 February).
- Statistics: 20–24 April 2026.
- Important links:
3 Description and Content
Yes, the comma in this page’s title is correct: “SAMOS Core Module 3: Introduction to R, and Statistics.” The module provides an introduction to the R software language. I will also teach Statistics.
This is a core module in your Honours programme. You will learn to use R for data analysis, visualisation, and statistical inference. You will also learn fundamental Statistics concepts, such as hypothesis testing, probabilities, confidence intervals, regression analysis, Analysis of Variance, and other staples of Statistics. I will use real-world datasets from the biological, ecological, and environmental fields that you can use to practice applying your R and Statistics skills.
The approach taken in this module is not dissimilar from a course in Data Science. However, in this module, we will not do data science, but we will use R to actually do science. There is a difference! Any scientist that can use R is also ideally equipped to be a data scientist, and some people who have completed this module actually do just that. The difference between the two ideas, philosophies, and careers is provided in the box immediately below.
The Intro R component of this module focuses on the functionality offered by the tidyverse suite of packages. I designed this component to introduce you to a powerful set of tools for data manipulation, exploration, and visualisation. The tidyverse is a collection of R packages that work together to provide a cohesive set of functions for manipulating data. This course will cover the most popular packages in the tidyverse, including tidyr for data reshaping, dplyr for data ‘wrangling’, and ggplot2 for data visualisation. You will learn how to clean, transform, and visualise data, as well as how to use these tools to build reproducible, informative data analysis pipelines. With a focus on practical application and hands-on exercises, you will gain the skills and knowledge needed to effectively use the tidyverse in your own data analysis projects.
In biological and ecological sciences, statistical methods play a crucial role in analysing, interpreting data. Some of the basic statistical methods used include:
Descriptive statistics These methods are used to summarise and describe the basic features of a dataset, such as the mean, median, and standard deviation.
Inferential statistics These allow you, the scientist, to make predictions, inferences about a population based on a sample of data. Common inferential statistical techniques include t-tests and ANOVA, and regression analysis.
Non-parametric statistics Non-parametric methods are called for when the data do not meet the assumptions of parametric statistics. Examples of non-parametric techniques include Wilcoxon rank-sum test and Kruskal-Wallis test.
4 Skills and Graduate Attributes
4.1 Core Skills
By the end of this module, you will be able to:
- Understand and use R within the RStudio IDE
- Know and understand the tidyverse suite of functions and approach to data analysis and graphics
- Understand the principles underlying tidy data
- Understand the types of data and data distributions that biologists and ecologists will frequently encounter
- Understand and be able to execute the most frequently used inferential statistical tests
- Use the R software and associated packages to undertake these analyses
- Interpret the outcomes of these analyses and use it to probabilistically make inferences about the scientific enquiries
- Communicate the findings by written and oral form
4.2 Graduate Attributes
The graduate attributes resulting from completion of this module align with the expectations of the workplace across diverse organisations and institutions where graduates typically find employment.
5 Assessment Policy
TBD.
5.1 Progress Portfolio
Starting 23 February 2026, you will submit a thoroughly annotated .html file produced from a Quarto document, that outlines each day’s teaching material you covered as you followed along in the class. Since it is made within Quarto, you will have to include code chunks, their output, and some narrative text that describes what each portion of your code does. The document must be:
- One continuous file that you add to each day (use a clear heading with the date and topic for every new day).
- Rendered to HTML and submitted daily starting 3 February 2026, on the same day of the lecture your portfolio material covers.
- Complete and readable, with the code you ran, the outputs it produced, and a few sentences explaining what the code does. Include any notes that you made for your future self, which will aid you to study the work for assessments you will encounter. Any text added must be in your own words, and explained in a way that you understand (in good English, without grammatical and spelling errors).
- Neatly structured, with short sections, headings, and clear figure/table outputs (no screenshots). I will significantly weigh the visual impression you ceate, so take pride in your work.
5.2 Self-assessment Tasks
Core Module 3.3 and 3,4 (Introduction to R, and Statistics) relies on regular, honest self-reflection about your grasp of each day’s lecture content. After every lecture, complete the Daily Self-Assessment Tasks to gauge your understanding; answers will be provided the following day, before introducing new content. Each task should be rated on a personal scale from 1 (no real comprehension) to 10 (complete mastery). These self-assessment marks will be kept on record and checked randomly, and we will discourage students from undertaking the Intro R Test and the Statistics Test if their self-assessment scores are consistently low.
If you realise you are struggling, seek assistance from the lecturer or teaching assistant early (ideally on the day). Consistent, candid self-assessment strongly correlates with later performance in the Intro R Test, the Statistics Test, and the combined Exam. The goal is to align your learning strategies with course expectations and build a foundation for success.
For the daily self-assessment tasks to be effective, you must work alone on all of them.
For more detail, see the module instructions provided in class.
5.3 Formal Assessment
TBD.
5.4 Submission of Assignments
- The Progress Portfolios must be submitted on the day of the lectures they cover.
- The Self-Assessment Tasks must be submitted by the date specified in the time table.
6 Data Used
All the data required for BCB744 may be downloaded here. After you have downloaded the archived (.zip) data, unzip it in a folder named data placed at the root of your R project. This will ensure that all the data are easily accessible to you.
6.1 World Ocean Atlas (WOA) data used in this module
Throughout this module we will repeatedly use a small, teaching-focused extract of the World Ocean Atlas 2018 (WOA18) climatology for the broader Southern Africa region.
- Processed teaching dataset (CSV):
data/SAMOS/processed/woa18_sa_core_1deg_monthly.csv - Data dictionary:
data/SAMOS/processed/woa18_sa_core_1deg_monthly_DICTIONARY.md - Provenance: a matching
*_PROVENANCE.mdfile is written alongside the dataset.
The dataset is in tidy (long) format and is designed to support the same examples across physical, chemical, and biological oceanography. Key columns include lon, lat, depth_m, month (0 = annual climatology; 1–12 = monthly), variable, value, and unit.
Variables currently included are:
- temperature
- salinity
- dissolved_oxygen
- nitrate
- phosphate
- silicate
6.2 Built-in Data
R also gives you access to many built-in datasets that are useful for practising our R skills. To find out which datasets are available to you on your system, execute the following command. Help files for each of the datasets are also available:
It is important to use these (or any) datasets to practice your R skills. Actively engaging with my web pages and practising on the included datasets will make the difference between a 60% average mark for the module and a mark in excess of 80%.
7 Prerequisites
You should have a moderate numerical literacy, but prior programming experience is not required. In all sciences, practical problem solving skills and tenacity in the face of challenges are crucial for success. Scientific disciplines constantly evolve and present new and complex problems that require creative and innovative solutions. You will have to demonstrate agile and adaptive approaches to solving challenges and you must have the ability to break down complex problems into smaller parts and approach them systematically. You must also be able to identify and overcome roadblocks, and be persistent in your efforts to find a solution. These attributes will allow you to be effective in this module.
8 Method of Instruction
The workshop is designed to be as interactive as possible, so while you are working on exercises the tutor, I will circulate among you and engage with you to help you understand any material and the associated code you are uncomfortable with. Often this will result in discussions of novel applications and alternative approaches to the data analysis challenges you are required to solve. More challenging concepts might emerge during the Tasks and Assignments (typically these will be submitted the following day) and any such challenges will be dealt with in class prior to learning new concepts.
Although the module ultimately supports the application of biologically-oriented statistics, a large part of it is also about programming. It is up to you to take your coding skills to the next level and move beyond what I teach in class. Coding is a bit like learning a language and, as such, programming is a skill that is best learned by doing.
9 Learning
9.1 Collaboration
Please refer to my advice about how to learn.
Collaborative learning provides an opportunity to work together and learn from each other. It develops communication, teamwork, and leadership, and it can deepen your understanding of the subject matter. Discuss the BCB744 module activities with your peers as you work on them. Use the WhatsApp group set up for the module for discussion purposes (I might assist via this medium if necessary if your questions/comments have relevance to the whole class). A better option is to use GitHub Issues. Ask questions, answer questions, and share ideas liberally. Please identify your work partners by name on all assignments (if you decide to work in pairs).
At the same time, you are individually responsible for the submitted work. Collaboration means discussing ideas, approaches, and interpretations; it does not include sharing or reusing code, text, or outputs. Anything you submit must be your own, and any external material (including AI-generated code or web-sourced snippets) must be clearly cited. Plagiarism is a serious offence and will be dealt with concisely. Consequences of cheating are severe — they range from a 0% for the assignment or exam up to dismissal from the course for a second offence.
9.2 Found Code
A huge volume of code is available on the web and it can be adapted to solve your own problems. You may make use of any online resources (e.g., from StackOverflow, a thoroughly-used source of discussion about R code) — but you MUST clearly indicate (cite) that your solution relies on found code, regardless of how much you have modified it to your own needs. Reused code that is discovered via a web search, which is not explicitly cited, is plagiarism and will be treated as such. On assignments you may not directly share code with your peers in this workshop.
9.3 AI Tools
The 2025 BSc (Hons) cohort will be the first to experience the use of AI tools in the BCB744 module. The use of AI tools is a new development and it is important that you are exposed to these tools. The use of AI tools will be limited to the use of the OpenAI ChatGPT tool, which may be used to generate ‘proto-code’ that will assist you in becoming familiar with the R language. We will explore ideas together, and the mark allocation to tasks and assignments will be adjusted accordingly.
10 Software
In this course you will rely entirely on R running within the RStudio IDE. The use of R is covered in the Introduction to R section of this site (start here: R and RStudio).
Additionally, the very basics — i.e. about R, RStudio, packages, their installation, etc. — can also be found on the ModernDive website. A slightly longer, more detailed account of the installation process and the very basics is provided on the datacamp platform.
ModernDive also provides a nice overview of using R for data science.
For more in-depth coverage of the R language, refer to R Master Hadley Wickham’s pages. There you will find everything you need to know in a well thought through presentation. Thoroughly working through this material, page by page, will quickly make you a R Master yourself (well, almost).
11 Computers
You are encouraged to provide your own laptops and to install the necessary software before the module starts. Limited support can be provided if required, but in the end, the onus is on you to understand how your computer works (from the filesystem through to dealing with software installation issues).
12 Attendance
This workshop-based, hands on course can only deliver acceptable outcomes if you attend all classes. The schedule is set, cannot be changed. Sometimes an occasional absence cannot be avoided. Please be courteous and notify myself or the tutor in advance of any absence. If you work with a partner in class and notify them too. Keep up with the reading assignments while you are away and we will all work with you to get you back up to speed on what you miss. If you do miss a class, however, the assignments must still be submitted on time (also see Late submission of CA).
Since you may decide to work in collaboration with a peer on tasks and assignments, please keep this person informed at all times in case some emergency makes you unavailable for a period of time. Someone might depend on your input, contributions — do not leave someone in the lurch so that they cannot complete a task in your absence.
13 Support
It is expected that some tricky aspects of the module will take time to master, and the best way to master problematic material is to practice, practice some more, and then to ask questions. Trying for 10 minutes and then giving up is rarely enough. I will be more sympathetic to your cause if you can demonstrate sustained effort before asking me. When you ask questions about a challenge, explain the approaches you tried and how they failed. I will not help you if you have not tried to help yourself first (maybe with advice from friends). There will be time in class to do this, typically before we embark on a new topic.
Should you require more time with me, find out when I am ‘free’, set an appointment by sending me a calendar invitation. I am happy to have a personal meeting with you via Zoom but I prefer face-to-face in my office.
Reuse
Citation
@online{a._j.2026,
author = {A. J. , Smit},
title = {SAMOS {Core} {Module} 3.2 \& 3.4},
date = {2026-02-07},
url = {http://samos-r.netlify.app/},
langid = {en}
}