Mastering Data Visualization: Creating Interactive Dashboards in R Programming

February 28, 2024

Henry Wallis

🇺🇸 United States

R Programming

Henry Wallis is a seasoned Statistics assignment expert who graduated from Duke University with a Master's degree in Computer Science. With over 8 years of experience, he has established himself as a go-to resource for solving complex data analysis.

Hire Now

R Programming Data Visualization

Key Topics

Problem Description
Assignment Solution:

Submit Your R Programming Assignment

Get a FREE Quote

Tip of the day

Avoid overfitting models by balancing complexity and predictive accuracy. Use cross-validation to ensure your model generalizes well to new data.

News

New AI-driven curriculum reshapes U.S. statistics degrees, emphasizing data ethics and real-time analysis. NSF funding boosts interdisciplinary programs blending stats with climate science and public health.

In this comprehensive data visualization project, students will delve into the world of interactive data dashboards using R programming. By working with diverse datasets and employing SAS Studio, students will learn to craft dynamic and informative visualizations. The project places a strong emphasis on understanding data relationships, patterns, and insights, enabling students to hone their skills in data analysis and visualization. From bar panels to scatterplot matrices, this hands-on experience will empower students to create interactive dashboards that effectively communicate complex information, fostering a deeper understanding of data and its significance in decision-making processes.

Problem Description

This Data Visualization using R Programmingassignment & involves creating informative graphics using SAS Studio, focusing on various datasets. The objective is to generate visualizations that reveal insights and patterns within the data. By implementing SAS Studio, students will create these visualizations and interpret their findings, demonstrating proficiency in interactive dashboard creation in R programming. This assignment empowers students to enhance their data analysis and visualization skills while summarizing their observations for better comprehension.

Assignment Solution:

Question 1:

SAS Code

PROC SGPANEL DATA = SASHELP.HEART NOAUTOLEGEND;
PANELBY BP_Status / NOVARNAME COLUMNS = 3 ONEPANEL;
VBAR Chol_Status / RESPONSE = MRW STAT = MEAN GROUP = Chol_Status;
ROWAXIS GRID;
RUN;

Fig 1: Bar Graph- Metropolitan Relative Weight by Cholesterol Status

From the graph above, it was found that the mean MRW was highest in High BP_Status. The rate keeps reducing as you move from High to normal and from normal to optimal.

Question 2:

SAS Code

TITLE 'Systolic Blood Pressure BY Weight and Sex';
PROC SGPANEL DATA = SASHELP.HEART;
VBOX systolic / CATEGORY = sex;
ROWAXIS GRID;

Fig 2: Boxplot- Systolic blood pressure by weight and sex

The data had a lot of observations that lay an abnormal distance from other values in a random sample from a population. The outliers were seen mostly for those with normal weight and those who were overweight. There was lesser dispersion of the data as seen by the thin box plots. The median is also in the middle of the boxplot indicating normal distribution in the data. The Overweight individuals (both male and female) had a higher systolic blood pressure as compared to the other weight status.

Question 3:

SAS Code

TITLE 'Mileage by Horsepower and Weight';
PROC SGSCATTER DATA=SASHELP.CARS(WHERE=(type IN ('Sedan' 'Sports' 'SUV')));
LABEL mpg_city = 'City';
LABEL mpg_highway = 'Highway';
COMPARE X = (mpg_city mpg_highway) Y = (horsepower weight) /
TRANSPARENCY = 0.3 MARKERATTRS = GRAPHDATA1(SYMBOL=CIRCLEFILLED);
RUN;

Fig 3: Scatter plot of mileage by horsepower and weight

From the comparative scatterplot constructed above, we compared the relationship between the Mileage per Gallon variables to Horsepower and weight in LBS. The analysis was conducted in four scatterplots. The second scatter plot explained the relationship between horsepower and highway. The same scale was used to measure the mileage per gallon. Weight was measured on a scale from 2000 to above 7000, while the horsepower was measured between 100 to 500. There was a negative correlation between the four factors. The strength of the relation differed. However, it was clearly seen that the relation between Weight and High negative relation.

Question 4:

SAS Code

TITLE 'Distribution of Body Mass Index (BMI) of Adult Men';
PROC SGPLOT DATA=SASHELP.BMIMEN(WHERE = (Age &gt;= 18));
HISTOGRAM BMI;
DENSITY BMI / LINEATTRS = (PATTERN = SOLID);
DENSITY BMI / TYPE = KERNEL LINEATTRS = (PATTERN = SOLID);
KEYLEGEND / LOCATION = INSIDE POSITION = TOPRIGHT ACROSS = 1;
YAXIS OFFSETMIN = 0 GRID;
RUN;

Fig 4: Histogram for the percentage of the Body mass index (BMI) of adult men

The histogram for the percentage of the Body mass index (BMI) was a classical bell-shaped, symmetric histogram with most of the frequency counts bunched between a BMI of 25 to 35 and with the counts dying off out in the tails. From the superimposed density curve, It was observed that it was always on or above the horizontal axis and had an area exactly 1 underneath it.

Question 5:

SAS Code

TITLE 'Scatterplot Matrix for HEART Data';
PROC SGSCATTER DATA=SASHELP.HEART(WHERE = (WEIGHT_STATUS = 'Normal'));
LABEL Cholesterol = 'Cholesterol';
LABEL Systolic = 'Systolic BP';
LABEL Diastolic = 'Diastolic BP';
LABEL weight = 'Weight';
MATRIX Cholesterol Systolic Diastolic weight /
TRANSPARENCY=0.7 MARKERATTRS=GRAPHDATA3(SYMBOL=CIRCLEFILLED);
RUN;

Fig 5: Scatter plot matrix for heart data

There was a single cluster for each group indicating a single group for each relationship. In both scenarios, there were outliers in the dataset for the relationships.

Question 6:

SAS Code

Fig 6: Average COVID-19 Daily New Cases by State

We realized that the 95% confidence interval was high for CA and TX. NYC and NY had the shortest 95% confidence interval. CA had an average new case of 9000, TX had an average of 6800, FL had an average of 5000.

Bonus Question:

For this question, your task is to create a Dot Plot displaying the average of COVID-19 new deaths by states (CA, FL, IL, OH, and TX) based on “Covid19DataCDC.xlsx” file.

Fig 7: Dot Plot Displaying Average COVID-19 new deaths by states

The average order for the new cases of COVID-19 for the different states in the order for the countries given are CH, IL, FL, TX, CA. We also realized that the 95% confidence interval was high for CA and TX. CH and IL had the shortest 95% confidence interval. CA had an average new case of 9000, TX had an average of 6800, FL had an average of 5000.

Related Samples

Explore our diverse collection of samples on statistics assignments, designed to provide insight and clarity on various topics. Delve into real-world examples, case studies, and solved problems to enhance your understanding and excel in statistical analysis. Whether you're a student seeking guidance or a professional looking for practical applications, our sample section offers valuable resources to aid your learning journey. Unlock the power of statistics with our comprehensive selection of illustrative materials.

See All Samples

Enhancing Credit Scoring Through Statistical Analysis: A Case Study Using German Credit Data

R Programming

Word Count

14157 Words

Writer Name:Margaret Roberts

Total Orders:345

Satisfaction rate:

Unleashing the Power of R on Probability Analysis & Inferential Statistics

R Programming

Word Count

5532 Words