Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20

힘센캥거루
2026년 1월 5일
4
38

Previously, I used to wonder whether I really needed to learn R when I already knew Python.

Through this training, I realized that there’s actually no need to use Python when doing research.

In Python, you’d have to do linear regression with numpy, draw graphs, calculate the p-value, and handle everything yourself, but in R you can finish it all with just lm and summary.

So today, I’m going to review all the R practice content we’ve learned so far and show some practice examples using real data.

1. Example data

The example data is a dataset of U.S. students’ test scores uploaded on Kaggle.

For those who are not signed up on Kaggle, I’ve attached a link below.

This dataset was created to examine the effects of factors such as parental background and test preparation courses on students’ academic performance.

I’ve pasted the internal values of the dataset below.

To explain briefly: gender is the student’s sex, race is ethnicity, parental level is the parents’ education level, lunch is the price level of the school lunch, and test preparation indicates whether the student completed a test preparation course.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-1

2. Opening a csv file in R Studio

The command to load a file in R Studio is read.

If you find it annoying to type the file path, you can just click the file, copy it, and then paste; the path will be inserted.

Or you can use the file.choose() command to select the file in a Windows dialog.

For reference, you can run each line of code with ctrl + enter (cmd + enter on Mac).

data <- read.csv("파일경로")
// dat <- read.csv(file.choose())

head(data)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-2

3. Linear regression analysis

Now let’s run a simple linear regression with this dataset.

The function lm takes as its internal parameters lm(dependent_variable ~ independent_variable, dataset) in that order.

For example, if you want to look at the relationship between math scores and writing scores, you can run the following:

m1 <- lm(data$math.score ~ data$writing.score, data)
summary(m1)

With this, the linear regression is completed very easily.

Without setting anything else, it conveniently outputs values detailed enough for a research paper, including errors, t-test, and p-value.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-3

4. Drawing charts

Drawing graphs is also extremely simple.

Just by typing plot(m1), R draws most of the necessary charts for you.

If you want a specific plot, you can specify the x-axis and y-axis values in order, separated by a comma.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-4

5. Multiple regression analysis

When there are many variables, you can list all the independent variables in the lm function’s independent-variable position, separated by +.

For example, if you want to see how reading and math scores affect writing scores, you can analyze it as follows:

m2 <- lm(data$writing.score ~ data$reading.score + data$math.score, data)
summary(m2)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-5

6. Handling categorical variables (non-numeric data)

Categorical variables are variables that divide data into qualitative groups or categories.

They are used to handle non-numeric data such as gender or education level.

Here, let’s use the simplest example, gender.

We will use the ifelse() function to inject a dummy variable called gender1 into data.

data$gender1 <- ifelse(data$gender == "male", 0, 1)
// 첫번째 조건이 참일경우 0, 거짓일 경우 1을 입력
head(data)

After doing this, if you check the table, you’ll see that a new gender1 column has been created, with 1 for female and 0 for male.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-6

Now you can use this to run a linear regression analysis.

Interestingly, even if you don’t do this and just put gender in directly, the analysis still works.

m3 <- lm(data$math.score ~ data$gender, data)
plot(m3)

This is because R internally processes character-type data the way we did above and then runs the analysis.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-7

Gender is easy because there are only two categories, but it’s a bit different for variables like parental education level or group, which have multiple categories.

If there are n categories, you need n-1 dummy variables.

You can create them manually, but it doesn’t seem like a bad idea to just entrust your soul to R.

m4 <- lm(data$math.score ~ data$race.ethnicity, data)
summary(m4)
plot(m4)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-8Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-9

7. Calculating and using residuals with resid

Using resid, you can calculate the residuals of each term relative to the linear regression line.

By looking at the residuals, you can check whether the data is linear and what the variance looks like.

First run the analysis, then use one variable and the analysis result to calculate residuals and plot them.

m5 <- lm(data$math.score ~ data$writing.score, data)
res1 <- resid(m5)

plot(data$writing.score, res1)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-10

Plotting the graph like this shows that the actual data is not homoscedastic.

In this case, you need to adjust the scale for each value.

8. Interaction analysis using R - stepwise regression

Stepwise regression is a method where you add variables one by one to check their influence.

The researcher can add them manually, but R can also do it automatically.

m7 <- lm(data$math.score ~ ., data)
m8 <- step(m7, direction = "both")
summary(m8)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-11

This method is easy to run, but the interpretation is tricky.

That’s why people say they prefer hierarchical regression, which analyzes according to the researcher’s intent.

9. Thoughts

I thought I could just throw everything into a stepwise regression, pick the model that explains the data best, and then draw conclusions in the direction of the lowest p-value, but that wasn’t the case.

R is convenient, but it made me realize that the researcher’s thought process is extremely important for drawing conclusions.

Before studying, I thought I could solve everything with Python without learning R, but that was a huge misjudgment.

I have come to worship R.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-12

관련 글

대교협 대입상담프로그램에 일부 교사가 접속이 안될 때 해결방법
대교협 대입상담프로그램에 일부 교사가 접속이 안될 때 해결방법
어느 날 부터 선생님 몇 분이 대교협 대입상담프로그램에 접속이 안되는 현상이 발생했다.문제는 다른 사람들은 다 문제없이 접속되는데, 딱 두 사람만 접속이 안되는 것.원인을 찾을 수 없어 2시간을 헤메다가 겨우 알게 되었다.1. 접속불가 증상먼저 터미널을 켜서 핑을 확인...
When the mock exam summary sheet won’t upload to Kcue, Univ, or Kim Young-il – Save as PDF
When the mock exam summary sheet won’t upload to Kcue, Univ, or Kim Young-il – Save as PDF
After taking a mock exam, you can print your score report from the Korea Institute for Curriculum and Evaluation either as a hard copy or in PDF forma...
How to Operate a Robot Vacuum Cleaner in the Classroom
How to Operate a Robot Vacuum Cleaner in the Classroom
It’s already been four years since I started using a robot vacuum cleaner in my classroom.This year, my goal is to have the students take full respons...
Automating School Work – Automating Draft Assessment Plans with Spreadsheets
Automating School Work – Automating Draft Assessment Plans with Spreadsheets
With the 2022 revised curriculum, the number of subjects has increased, and as a result we now have to rewrite the assessment plan every semester.The...
Review of Offline Participation in the 2026 Dongguk University Future Society Teacher Competency Enhancement Forum
Review of Offline Participation in the 2026 Dongguk University Future Society Teacher Competency Enhancement Forum
One of the teachers introduced a training program that looked interesting.It was AI-related training held at Dongguk University.AI training is nice, b...
Automating School Work – Using AI to Check Subject-Specific Remarks in Student Records
Automating School Work – Using AI to Check Subject-Specific Remarks in Student Records
If I had to pick the most meaningless, exhausting, and boring task at school, I would choose checking student records.In middle school, the student re...

댓글을 불러오는 중...