About the Trainings
Most class sessions have both interactive Modules courtesy of Data Camp1 and Walkthroughs created by me that you will need to work through after doing the readings and reviewing the corresponding content (if applicable). The lessons are a central part of the class and are focused on using the tidyverse family of packages, though the approaches are certainly not the only ways to wrangle, clean, analyze, and visualize data in R.
Advice
Carve out some time everyday to go through these. If you try to complete everything in one sitting, it will probably be overwhelming! However if you have familiarity with some modules, please feel free to work ahead.
Grading
The ultimate point of Data Camp is to get you familiarized with an environment that you likely have never seen or been exposed to. While you should absolutely go through each module, there is certainly no expectation that you will get everything right. In fact, the points that you incur don’t mean anything as far as how you are assessed so please use hints as needed! As with any things data science, you’ll learn by doing. If you have a polar personality type as it pertains to work (i.e. primarily a perfectionist or mostly careless), then the modules will likely prove to be a challenge. The chance that you will be able to comprehend everything by going beyond your limit or conversely assuming it will just come to you is low so please work hard but also take breaks, swear2, look on the Internet, ask peers, or reach out for help. Your score is predicated on putting in a solid effort, rather than getting it perfect because that’s not realistic when it comes to data.
Data Camp Schedule
A tentative schedule is given below. The Course and Chapter names represent Data Camp titles3:
Required
Modules that do
not require a corresponding task will be assessed only on the successful completion of the data camp course
require a task will be assessed on both the successful completion of the data camp course and corresponding assessment to be submitted via eCampus
Link | Due | Required | Task | Module | Chapters |
---|---|---|---|---|---|
Week 1 | 8/30/22 | Introduction to R | Intro to basics | ||
Vectors | |||||
Matrices | |||||
Factors | |||||
Data Frames | |||||
Lists | |||||
Week 1 | 8/30/22 | Introduction to R | Intro to basics | ||
Vectors | |||||
Matrices | |||||
Factors | |||||
Data Frames | |||||
Lists | |||||
Week 2 | 9/6/22 | Introduction to the Tidyverse | Data wrangling | ||
Data visualization | |||||
Grouping and summarizing | |||||
Types of visualizations | |||||
Week 2 | 9/6/22 | Introduction to Data Visualization with ggplot2 | Explore your data | ||
Tame your data | |||||
Tidy your data | |||||
Transform your data | |||||
Week 2 | 9/6/22 | Introduction to Data Visualization with ggplot2 | Explore your data | ||
Tame your data | |||||
Tidy your data | |||||
Transform your data | |||||
Week 3 | 9/20/22 | Intermediate Data Visualization with ggplot2 | Statistics | ||
Coordinates | |||||
Facets | |||||
Best Practices | |||||
Week 3 | 9/20/22 | Intermediate Data Visualization with ggplot2 | Statistics | ||
Coordinates | |||||
Facets | |||||
Best Practices | |||||
Week 4 | 10/4/22 | Visualization Best Practices in R | Proportions of a whole | ||
Point data | |||||
Single distributions | |||||
Comparing distributions | |||||
Week 4 | 10/4/22 | Visualization Best Practices in R | Proportions of a whole | ||
Point data | |||||
Single distributions | |||||
Comparing distributions | |||||
Week 5 | 10/18/22 | Unsupervised Learning in R | Unsupervised Learning in R | ||
Hierarchical clustering | |||||
Dimensionality reduction with PCA | |||||
Putting it all together with a case study | |||||
Week 6 | 11/1/22 | Introduction to Text Analysis in R | Wrangling Text | ||
Visualizing Text | |||||
Sentiment Analysis | |||||
Topic Modeling | |||||
Week 7 | 11/15/22 | Communicating with Data in the Tidyverse | Custom ggplot2 themes | ||
Creating a custom and unique visualization | |||||
Introduction to Rmarkdown | |||||
Customizing your RMarkdown report | |||||
Week 8 | 11/29/2022 | Analyzing Social Media Data in R | Understanding Twitter data | ||
Analyzing Twitter data | |||||
Visualize Tweet texts | |||||
Network Analysis and putting Twitter data on the map | |||||
Recommended
The following module is optional but highly recommended
Required | Task | Module | Chapters |
---|---|---|---|
Intermediate R | Conditionals and Control Flow | ||
Loops | |||
Functions | |||
The apply family | |||
Utilities |
Extra Credit
The following modules are optional and may count as extra credit contingnet on the successful completion of the data camp course and corresponding assessment to be submitted via eCampus. Please note that each subsequent module is dependent on the previous one.
Due | Required | Task | Module | Chapters |
---|---|---|---|---|
12/9/22 | Network Analysis in the Tidyverse | The hubs of the network | ||
In its weakness lies its strength | ||||
Connection patterns | ||||
Similarity clusters | ||||
12/9/22 | Supervised Learning in R: Classification | k-Nearest Neighbors (kNN) | ||
Naive Bayes | ||||
Logistic Regression | ||||
Classification Trees | ||||
12/9/22 | Predictive Analytics using Networked Data in R | Introduction, networks and labelled networks | ||
Homophily | ||||
Network Featurization | ||||
Putting it all together | ||||
R Tasks
In some weeks you will be expected to complete an additional R task which are indicated by a in the table above. Collectively these serve as the R Data EDA noted on the syllabus.
Working Ahead
By no means do you have to wait for a particular module to be assigned. If you wish to enroll in a training - one that is assigned or otherwise - simply search for the name of that course on the Data Camp site. For those modules assigned for this course, you will receive credit after the due date has passed.
Need Help?
While I am happy to meet face-to-face, it is just as easy to schedule a Zoom session using the calendar or by notifying me on Slack by adding @Dr. Abhik Roy to your message.