#Loading raw data into R
read_csv(here("death-rate-from-alzheimers-other-dementias-ghe.csv"))
#Reading dataset into a dataframe
df<-read.csv("death-rate-from-alzheimers-other-dementias-ghe.csv")PSY6422-FinalProject
Death rates from Alzheimer’s disease and other forms of dementia across Europe from 2000-2021.
Sophie Ryan
04/12/25
Background and Data Origins
As a postgraduate Masters student studying cognitive neuroscience and human neuroimaging I am interested in the impact of neurodegenerative diseases on health and populations. This inspired me to create a visualisation looking at how deaths from Alzheimer’s disease and other forms of dementia are increasing or decreasing and how this specifically is impacting health and therefore death rates. The raw data set was obtained from Our World Data website which provided an estimate of the annual number of deaths from Alzheimer’s disease and other forms of dementia per 100,000 people from 2000 to 2021 globally, both men and women. This was based on Data from the World Health Organisation (2024) that was then processed by Our World Data to provide the data set used in this project.
A link to the data set can be accessed Here
My visualisation will focus on answering whether there has been an increase in deaths from Alzheimer’s disease and other dementia’s across Europe from the year 2000-2021 and whether one specific country has a higher death rate than others due to these neurodegenerative diseases.
Research Question: “How have death rates from Alzheimer’s disease and other forms of dementia’s changed over time and how has this differed across countries in Europe?”
The Project
Data Preparation:
Before starting any data preparation the packages needing to be installed and loaded are:
tidyverse
here
dplyr
ggplot2
plotly
Loading in the raw data set before I can begin cleaning and organising.
#Showing first 10 rows of the raw data
head(df,10) Entity Code Year
1 Afghanistan AFG 2000
2 Afghanistan AFG 2001
3 Afghanistan AFG 2002
4 Afghanistan AFG 2003
5 Afghanistan AFG 2004
6 Afghanistan AFG 2005
7 Afghanistan AFG 2006
8 Afghanistan AFG 2007
9 Afghanistan AFG 2008
10 Afghanistan AFG 2009
Death.rate.from.alzheimer.disease.and.other.dementias.among.both.sexes
1 4.62
2 4.68
3 4.73
4 4.75
5 4.81
6 4.93
7 5.05
8 4.91
9 4.83
10 4.90
Data Cleaning
The current raw data includes death rates from Alzheimer’s and other dementia’s across every country in the world. I want to focus just on countries in Europe. Therefore I am going to remove the rows that are irrelevant to my analysis and save a new data set with just the death rates from Alzheimer’s and other dementia’s across Europe from 2000-2021.
#1. Define a vector of European countries to give me something to reduce my raw data based upon, this was quite tedious but i couldnt find a better way of doing this with such a large data set
european_countries<- c(
"Albania", "Andorra", "Armenia", "Austria", "Azerbaijan",
"Belarus", "Belgium", "Bosnia and Herzegovina", "Bulgaria",
"Croatia", "Cyprus", "Czechia", "Denmark", "Estonia", "Finland",
"France", "Georgia", "Germany", "Greece", "Hungary", "Iceland",
"Ireland", "Italy", "Kazakhstan", "Latvia", "Lithuania", "Luxembourg", "Malta", "Moldova", "Monaco", "Montenegro", "Netherlands", "North Macedonia", "Norway", "Poland", "Portugal", "Romania", "Russia", "San Marino", "Serbia", "Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland", "Turkey", "Ukraine",
"United Kingdom"
)
#2. Filter the data frame
#Using the function 'filter' to keep all rows where the country in the "Entity" column is also present in the 'european_countries' vector that was created
europe_alzheimers_df<-df %>%
filter(Entity %in% european_countries)
#3. Displaying first few rows of the new data frame to check it has removed the relevant data
print(head(europe_alzheimers_df)) Entity Code Year
1 Albania ALB 2000
2 Albania ALB 2001
3 Albania ALB 2002
4 Albania ALB 2003
5 Albania ALB 2004
6 Albania ALB 2005
Death.rate.from.alzheimer.disease.and.other.dementias.among.both.sexes
1 20.83
2 14.04
3 16.57
4 19.36
5 19.53
6 20.77
#4. Counting the number of rows in the original data frame and the new data frame to check that the intended rows were removed
print(paste("Original rows:", nrow(df)))[1] "Original rows: 4422"
print(paste("European rows:", nrow(europe_alzheimers_df)))[1] "European rows: 1056"
#5. Saving the new data set with just European countries in it
write.csv(europe_alzheimers_df, "european_alzeimers_deaths.csv", row.names = FALSE)
#6 Viewing the new clean data set in a table format
view(europe_alzheimers_df)
#7 Renaming the column for countries
europe_alzheimers_deaths<- europe_alzheimers_df %>% #creating final data set vector
rename(Country = Entity, #renaming the original column name ('Entity') as 'Country'
DeathRate = Death.rate.from.alzheimer.disease.and.other.dementias.among.both.sexes)
#8 Viewing final clean data set
view(europe_alzheimers_deaths)Data Visualisation 1
Firstly I experimented with line graphs to see how my data could be visualised.
ggplot(
europe_alzheimers_deaths, #calling the data set
aes(x = Year, y = DeathRate, colour = Country, group = Country)) +
#Adding x and y axis and grouping data points by country
geom_line() + #adding the lines
labs (title = "Alzheimers deaths by European country from 2000-2021",
x = "Year",
y = "Number of Deaths",
colour = "Country") +
#adding title and labels for x and y axis and assigning a different colour for each country
theme_minimal()
This was just a basic line graph for me to get an idea on how I can work with the data I have and what sort of graph I could create. This one is very simple and I don’t like the proportions and it does not look that great in terms of positioning, colour and just general visability so I am going to try and advance this visualisation into something clearer and more precise.
alzheimer_plot<- ggplot( #assigning the plot a name
data = europe_alzheimers_deaths, #calling the data
aes(
x = Year,
y = DeathRate,
group = Country,
colour = Country
) #adding x and y axis and grouping values by country and assigning a default colour palette giving a different colour for each country
) +
geom_line(
linewidth = 0.8, #setting thickness of lines
alpha = 0.8 #controlling transparency of the lines
) +
geom_point( #adds data points across the lines
size = 1.5,
alpha = 0.8
) +
labs(
title = "Alzheimer's and Dementia Death Rates in European Countries Over Time",
subtitle = "Death rate per 100,000 population, by country and year.",
x = "Year",
y = "Death Rate (per 100,000)",
colour = "Country"
) + #labelling the graph
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"), #changing size of title and making it bold
plot.subtitle = element_text(size = 12), #changing size of subtitle text
axis.title = element_text(size = 12), #assigning text size for the axis labels
legend.position = "right", #making the position of the legend on the right of the graph
panel.grid.minor = element_blank() #hiding minor grid lines
)
print(alzheimer_plot) #showing the graph 
This graph is better in terms of visual appearance, labels and detail, but the legend is too big and the graph is squashed. The colours of the lines and points are too similar and there’s just too much going on in the graph to read the data clearly, so its still unclear.
alzheimer_plot<- ggplot(data = europe_alzheimers_deaths, #calling the data
aes(
x = Year,
y = DeathRate,
group = Country,
colour = Country,
#adding x and y axis, grouping by country and making each country a different colour, I couldn't find a colour palette that covers enough colours to assign a different clear colour to each country better than the default - this is an issue with this type of graph I started to realise
)) +
geom_line(
linewidth = 0.4,
alpha = 0.8
) +
geom_point(
size = 1.0,
alpha = 0.8
) + #assigning line and point width and transparency
labs(
title = "Alzheimer's and Dementia Death Rates in European Countries Over Time",
subtitle = "Death rate per 100,000 population, by country and year.",
x = "Year",
y = "Death Rate (per 100,000)",
colour = "Country"
) + #labelling titles and axis
theme_minimal() +
theme(
legend.position = "bottom", #changing legend position to underneath the graph to make the graph less squashed
legend.direction = "horizontal",
legend.text = element_text(size = 7), #assigning text size inside the legend
legend.title = element_text(size = 8, face = "bold"), #assigning title text size
legend.key.size = unit(0.3, "lines"), #making the key size smaller
legend.box.margin = margin(t = -15, r = 0, b = 0, l = 0), #decreasing amount of empty space around the legend
legend.box.spacing = unit(0.5, "cm"), #changing space between the legend and the graph
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.title = element_text(size = 12),
panel.grid.minor = element_blank()
) + #assigning text size for labels and titles and removing minor grid lines
guides(colour = guide_legend(ncol = 8)) #organising legend into 8 columns to make it as small in height as possible
print(alzheimer_plot)
After experimenting with the line graphs, I decided this data is too dense and although this final visualisation is better for readability i think it is still quite unclear and the data is all squashed together at the bottom, with similar colours, making them difficult to read and distinguish between. Therefore I decided to create a heat map which is better for dense data sets to see if this would be a better visualisation.
Data Visualisation 2
alzheimers_heatmap<- europe_alzheimers_deaths %>% #naming the graph
mutate(Country = fct_reorder(Country, DeathRate, .fun = 'mean')) %>% #reordering data by each countries' mean death rate to make the visualisation clearer with the changes in colour
ggplot(mapping = aes(x = Year, y = Country, fill = DeathRate)) + #adding x and y axis and what the data is being grouped by
geom_tile() +
scale_fill_viridis_c(option = "turbo") + #adding colour palette
labs(title = "Heatmap of Death Rates from Alzheimers and
other dementias across Europe 2000-2021",
subtitle = "Countries ordered by mean death rate per 100,000",
x = "Year", y = "Country", fill = "Death Rate") + #labelling title and axis and legend
theme_minimal() +
theme(
plot.title = element_text(size= 16, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.title = element_text(size = 12),
axis.text.y = element_text(size = 6))
#adjusting size and look of text and labels on the graph
alzheimers_heatmap #viewing the heatmap
I think this heat-map is a great way to view the type of dense data that I chose. From this you can clearly see each country name separately and each colour is clear making the graph easy to read, including a clear colour legend. This also highlights very well the much higher death rates in Monaco and the drastic increase in the death rates in this country from 2010 - 2020. Additionally, we can see how death rates have increased over the years as the colours become lighter approaching 2020, with the highest death rate countries grouped at the top. This is a much better way to visualise this data and make it easier to read whilst also being pleasing to look at.
Lastly to complete my final visualisation I want to make this graph interactive.
Final Visualisation
alzheimers_heatmap<- europe_alzheimers_deaths %>% #naming the graph
mutate(Country = fct_reorder(Country, DeathRate, .fun = 'mean')) %>% #reordering data by each countries' mean death rate to make the visualisation clearer with the changes in colour
ggplot(mapping = aes(x = Year, y = Country, fill = DeathRate)) + #adding x and y axis and what the data is being grouped by
geom_tile() +
scale_fill_viridis_c(option = "turbo") + #adding colour palette
labs(title = "Heatmap of Death Rates from Alzheimer's and
other dementia's across Europe 2000-2021",
subtitle = "Countries ordered by mean death rate per 100,000",
x = "Year", y = "Country", fill = "Death Rate") + #labelling title, axis and legend
theme_minimal() +
theme(
plot.title = element_text(
size= 14, #size of title
face = "bold", #making the title bold
margin = margin(b = 20), #adding 20 points of space below the title
hjust = 0.5), #centering title
plot.subtitle = element_text(
size = 10, #size of the subtitle
hjust = 0.5), #centering the subtitle
axis.title = element_text(size = 12),
axis.text.y = element_text(size = 6))
#adjusting size and look of labels on the graph
interactive_alzheimers_heatmap<- ggplotly(alzheimers_heatmap) #converting plot to plotly
interactive_alzheimers_heatmap <- interactive_alzheimers_heatmap %>%
layout(
title = list(
text = paste0(
"<b><span style = 'font-size:16px'>Heatmap of Death Rates from Alzheimer's and other dementia's across Europe 2000-2021</span></b><br>",
"<b><span style='font-size:12px'>Countries ordered by mean death rate per 100,000</span></b>"), #manually adding the title and subtitle back into the graph as when I did it before it removed the subtitle and adjusting sizes of the text
y = 0.95, #moving title slightly down
x = 0.5, #centering horizontally
xanchor = 'centre',
yanchor = 'top',
pad = list(b = 15)), #adding padding below the text
margin = list(t = 50)) #increasing top margin to prevent overlapping of the plot
interactive_alzheimers_heatmapSummary
My final visualisation shows the death rates from Alzheimer’s disease and other dementia’s across countries in Europe from the year 2000 - 2021, in a way that it can clearly be seen what countries have the highest death rates and how countries death rates have increased over the years. From this project, I have learnt how to handle large data sets which was intimidating before starting this project. I have also learnt to work around errors in code and try out different methods to do what I want it to do. During the visualisation creation, I was able to experiment with different functions and different types of plots and also make my final visualisation interactive, after having no prior experience in coding. If I had more time I would create a graph for each continents death rates, possibly trying out different ways of showing such dense data and then I would have liked to compare each continents average death rate to get an idea of the death rates globally.