Concern about the environment has grown rapidly in the last decade. It’s an important topic today. As a result, Reduce, Reuse, & Recycle programs are springing up in many communities to help encourage us to be more environmentally conscious.
This post is inspired by a recent conversation I had with a data scientist who works with a non-profit university. He wants to help them publish data to the faculty and students for the betterment of the community. In one scenario, we discussed students having access to their electric and water utilization data. He believed that access to this type of data would encourage students to be more environmentally conscious. I was extremely interested in this conversation and decided to devote a Talking Shop post to this question:
“Can we motivate students to be more environmentally conscious?”
The university administration wants to increase student awareness of the environment. They hope to inspire the student body to become more environmentally conscious by rewarding students for participating and succeeding in their monthly environmental campaigns.
As data scientists, we can offer some insight:
“If we cluster students by their participation in our environmental campaigns, we could take more targeted actions to inspire them.”
For example, if certain students participate in our recycling plastics campaign but not our water conservation one, we could send them a note thanking them for recycling and offer gift cards for participating in our next water conservation campaign.
Data & Tools
We’ll be working with a few different tools for this clustering analysis.
The university is running e-mail surveys to track participation in its environmental campaigns. Students choose to participate by enrolling their dorm room into a campaign. The university will track data on the enrolled room and if participants succeed at the campaign goal (for e.g. reduce water consumption), they are added to our participants list for that campaign.
- participant: tracks participation by each dormitory room (if they participated and met the campaign goal)
- campaign: environmental campaign data (dates that a campaign ran, details about the campaign)
- student: table with student emails and their associated dormitory room
We’ll be using these three data sets throughout the post.
Once again, I’ll be using a Jupyter notebook and the Autodaas data platform to perform this analysis. Jupyter is a powerful tool that lets me write python against my data, and Autodaas lets me focus on using Jupyter to analyze my data, not on accessing the database or managing it.
(The images in this post are taken from the complete Jupyter notebook analysis, you can view here.)
At a high level, my workflow is:
- import the database into Autodaas.
- transform the data using the Autodaas Visual Query builder.
- use python data libraries to analyze the data.
Join data in Autodaas
To get started, I import our database into Autodaas using the MySQL connector on the connections page.
After a few seconds, the tables are synced in Autodaas, and I can start working with the data.
Next, I use the Autodaas query builder to join the campaign and participant tables into a single data-set view, campaign_participant. (That’s the data that I’ll be sending to Jupyter.)
I’ll select the columns that I want from each source, so my dataset is ready for the clustering analysis.
Analyze data with Jupyter
Next, I connect my Jupyter notebook to Autodaas and bring in my campaign_participant dataset and the students table. I’ll be using several python libraries and the k-means clustering algorithm to pivot and cluster our data.
The goal here is to create groups of students (“clusters”), based on their room’s involvement in similar environmental campaigns. For example, if several students participated heavily in dorm room recycling, somewhat in water conservation, but didn’t conserve electricity at all, we would like to group them together. Grouping allows us to better track, analyze and promote our campaigns.
I’ll use the python pandas library to covert our campaign_participant data-set into a pivot table (matrix). The matrix holds 32 campaign-months records vs 100 room-participation records.
Then I’ll feed the matrix into the KMeans clustering algorithm, which will divide the data into groups of similar data points.
Let’s aggregate the cluster column so we can see how many participants have been placed in each group:
Next, I’m going to apply the Principal-Component Analysis algorithm to reduce the dimensionality to a 2-Dimensional space (X and Y).
- Now I have an X and Y coordinate for each data point. My graph shows the 3 cluster-groups, based on their campaign participation.
Finally, I’ll group the clusters by their campaign participation count. This will let me see what the participation of each campaign looks like, within each cluster. This is important, because, we want to reach-out and support each cluster based on this information.
- Which group did best at the Paper recycling campaign? Group 2, with a total of 8 dorm rooms participating.
- Which groups did poorly at Water conservation campaign? Group 0 had no participation in our water campaigns.
- We group cares about the most campaigns? Group 2 participated in 9 campaigns.
How can we use this data?
Based on our analysis, the university can now send emails to these various groups, encouraging them to participate in campaigns, and offering incentives to do so.
“Let’s offer a 10% discount at the university bookstore to anyone from Group 0 who donates a clothing item in the next Clothing Donation campaign.”
I’ll use the pandas merge function to join our student_clusters dataset to the student table to get a list of Group 0’s emails.
Since Autodaas maintains live connections to data sources, in 2 weeks we’ll click refresh on our Jupyter notebook and see updated numbers. This will tell us if our offer is working.
Can we motivate students to be more environmentally conscious?
Yes! The university can now use email, posters, discounts & perks to encourage participation through better targeted environmental campaigns. The campaigns will be aimed at groups of students with similar habits, so we believe they will be more likely to respond vs a general catch-all campaign.
Further, we can study student groups, track how they perform and improve our process. We’ll learn what each group cares about, try to motivate them to participate in campaigns which they missed or under-performed in.
Long term, we think the university will become more environmentally-friendly, attract similarly-minded students, save money, get positive media coverage and most importantly, produce better graduates and citizens.
Clustering and segmentation are powerful data tools to help you understand and target your audience.
We built Autodaas to help data scientists and data analysts focus on answering questions quickly, rather than the infrastructure, configuration and administration of a data environment. Our goal is to make it easy to get all your data in one place, so you can use it with your favorite tools easily. We’re excited about the advanced analyses that you can create with Autodaas.
If you’re interested in learning more about Autodaas, contact us to see a demo.