Skip to main content
Login
      • Discover
        • For Executives
          • For Startups
            • Lakehouse Architecture
              • Mosaic Research
              • Customers
                • Featured Stories
                  • See All Customers
                  • Partners
                    • Cloud Providers
                      Databricks on AWS, Azure, GCP, and SAP
                      • Consulting & System Integrators
                        Experts to build, deploy and migrate to Databricks
                        • Technology Partners
                          Connect your existing tools to your Lakehouse
                          • C&SI Partner Program
                            Build, deploy or migrate to the Lakehouse
                            • Data Partners
                              Access the ecosystem of data consumers
                              • Partner Solutions
                                Find custom industry and migration solutions
                                • Built on Databricks
                                  Build, market and grow your business
                                • Databricks Platform
                                  • Platform Overview
                                    A unified platform for data, analytics and AI
                                    • Data Management
                                      Data reliability, security and performance
                                      • Sharing
                                        An open, secure, zero-copy sharing for all data
                                        • Data Warehousing
                                          Serverless data warehouse for SQL analytics
                                          • Governance
                                            Unified governance for all data, analytics and AI assets
                                            • Real-Time Analytics
                                              Real-time analytics, AI and applications made simple
                                              • Artificial Intelligence
                                                Build and deploy ML and GenAI applications
                                                • Data Engineering
                                                  ETL and orchestration for batch and streaming data
                                                  • Business Intelligence
                                                    Intelligent analytics for real-world data
                                                    • Data Science
                                                      Collaborative data science at scale
                                                    • Integrations and Data
                                                      • Marketplace
                                                        Open marketplace for data, analytics and AI
                                                        • IDE Integrations
                                                          Build on the Lakehouse in your favorite IDE
                                                          • Partner Connect
                                                            Discover and integrate with the Databricks ecosystem
                                                          • Pricing
                                                            • Databricks Pricing
                                                              Explore product pricing, DBUs and more
                                                              • Cost Calculator
                                                                Estimate your compute costs on any cloud
                                                              • Open Source
                                                                • Open Source Technologies
                                                                  Learn more about the innovations behind the platform
                                                                • Databricks for Industries
                                                                  • Communications
                                                                    • Media and Entertainment
                                                                      • Financial Services
                                                                        • Public Sector
                                                                          • Healthcare & Life Sciences
                                                                            • Retail
                                                                              • Manufacturing
                                                                                • See All Industries
                                                                                • Cross Industry Solutions
                                                                                  • Customer Data Platform
                                                                                    • Cybersecurity
                                                                                    • Migration & Deployment
                                                                                      • Data Migration
                                                                                        • Professional Services
                                                                                        • Solution Accelerators
                                                                                          • Explore Accelerators
                                                                                            Move faster toward outcomes that matter
                                                                                          • Training and Certification
                                                                                            • Learning Overview
                                                                                              Hub for training, certification, events and more
                                                                                              • Training Overview
                                                                                                Discover curriculum tailored to your needs
                                                                                                • Databricks Academy
                                                                                                  Sign in to the Databricks learning platform
                                                                                                  • Certification
                                                                                                    Gain recognition and differentiation
                                                                                                    • University Alliance
                                                                                                      Want to teach Databricks? See how.
                                                                                                    • Events
                                                                                                      • Data + AI Summit
                                                                                                        • Data + AI World Tour
                                                                                                          • Data Intelligence Days
                                                                                                            • Event Calendar
                                                                                                            • Blog and Podcasts
                                                                                                              • Databricks Blog
                                                                                                                Explore news, product announcements, and more
                                                                                                                • Databricks Mosaic Research Blog
                                                                                                                  Discover the latest in our Gen AI research
                                                                                                                  • Data Brew Podcast
                                                                                                                    Let’s talk data!
                                                                                                                    • Champions of Data + AI Podcast
                                                                                                                      Insights from data leaders powering innovation
                                                                                                                    • Get Help
                                                                                                                      • Customer Support
                                                                                                                        • Documentation
                                                                                                                          • Community
                                                                                                                          • Dive Deep
                                                                                                                            • Resource Center
                                                                                                                              • Demo Center
                                                                                                                              • Company
                                                                                                                                • Who We Are
                                                                                                                                  • Our Team
                                                                                                                                    • Databricks Ventures
                                                                                                                                      • Contact Us
                                                                                                                                      • Careers
                                                                                                                                        • Working at Databricks
                                                                                                                                          • Open Jobs
                                                                                                                                          • Press
                                                                                                                                            • Awards and Recognition
                                                                                                                                              • Newsroom
                                                                                                                                              • Security and Trust
                                                                                                                                                • Security and Trust
                                                                                                                                            • Data and AI summit

                                                                                                                                              JUNE 9–12 | SAN FRANCISCO

                                                                                                                                              700+ sessions on all things data intelligence. Get ready to dive deep.

                                                                                                                                              REGISTER
                                                                                                                                            • Ready to get started?
                                                                                                                                            • Get a Demo
                                                                                                                                            Data and AI summit

                                                                                                                                            JUNE 9–12 | SAN FRANCISCO

                                                                                                                                            700+ sessions on all things data intelligence. Get ready to dive deep.

                                                                                                                                            REGISTER
                                                                                                                                            • Login
                                                                                                                                            • Try Databricks
                                                                                                                                            1. Blog
                                                                                                                                            2. /
                                                                                                                                              Insights
                                                                                                                                            3. /
                                                                                                                                              Article

                                                                                                                                            Understanding New Years Trends: A Simple, Unified Pipeline on the Databricks Lakehouse

                                                                                                                                            newyear-trends-blog-og

                                                                                                                                            Published: January 20, 2022

                                                                                                                                            Insights8 min read

                                                                                                                                            by Talia Visaggi, Nicholas Barretta and Mikaila Garfinkel

                                                                                                                                            Share this post

                                                                                                                                            Keep up with us

                                                                                                                                            Try the notebooks referenced throughout this post. Overview, Tweet Ingestion, Tweet Categorization & Result Population  

                                                                                                                                            For many people, the start of a new year marks the perfect time to make a change. That’s why, despite the rather polarizing nature, New Year’s resolutions remain an important tradition for kickstarting a personal goal.

                                                                                                                                            Oftentimes, they’re not terribly creative – improve fitness, adopt a hobby, go someplace new. But over the past two years, as we collectively handle a global pandemic, many of us have experienced a shift in mindset on what’s important or what success means. We’ve seen this shift in all sorts of ways — The Great Resignation, definitions of wealth, new norms for socializing and more.

                                                                                                                                            With this in mind and the onset of 2022, a few of us at Databricks thought it would be interesting to examine how post-pandemic life has impacted New Year’s resolutions, which are essentially snapshots into the most popular goals and trends. To do this, we used Databricks and the Twitter API to perform keyword search based on a pre-trained collection of word vectors provided by GloVe — and the results were pretty interesting.

                                                                                                                                            This blog post will walk through how exactly we went about executing this use case leveraging Databricks, the Twitter API and easily-accessible open source tools. Then, we’ll share the findings of our analysis, which we think truly reflect the changing nature of the times. Let’s dive in!

                                                                                                                                            Why Databricks?

                                                                                                                                            First, let’s give a brief intro to Databricks and why it made this use case so simple to execute.

                                                                                                                                            To perform this use case, we needed to aggregate the relevant data set from Twitter, process and prepare it for our keyword search, classify it, and then store the results in a place where the data can be queried and visualized in meaningful ways. Databricks offers us all of these capabilities out of the box with our Lakehouse platform, which combines the reliability, performance and governance of data warehouses with the openness and flexibility of data lakes. Not a single external system needed to be set up.

                                                                                                                                            Databricks facilitates easy implementation of the Lakehouse architecture and Delta Lake as a managed service, allowing data practitioners to take advantage of the cost-effective and highly-scalable nature of cloud object storage while also enabling performant queries and visualizations to be built on top of the stored data. Best of all, it does all of this without requiring data to be converted to a proprietary format or funneled into a traditional data warehouse. This means that an entire data team (data scientists, analysts, and data engineers) can execute this use case end-to-end with the tools they are most comfortable with and within an open, collaborative environment.

                                                                                                                                            How we did it

                                                                                                                                            Data Ingestion & Processing

                                                                                                                                            The first step was in many ways the hardest: determining what data set we’d use to capture global New Year’s resolutions. We determined Twitter was the best option since it’s a conversational platform with a global user base, is easily searchable and comes with a Developer API. Since the goal was to compare pre and post-pandemic goals, we needed a historical data set. We used a historical dataset from 2015 that provided over 5,000 New Year’s resolution-related tweets.

                                                                                                                                            For comparison’s sake, we then aggregated a data set of relevant tweets from this year using the Twitter API. First, we built a Notebook to ingest tweets and build our data set. We collected tweets based on selected phrases – #NewYearsResolutions and associated hashtags and keywords – between the dates of 12/17/2021 and 1/2/2022. We ended up with quite a large sample of tweets, so we randomly sampled approximately 10,000 of them to be more in line with the size of our historic data set.

                                                                                                                                            To accelerate the ingestion step, we used Tweepy, a Python library that makes it easy to interact with the Twitter API. As an aside, since Databricks notebooks allow for mixing languages, it was very easy to run a shell command to import the needed Python libraries into our environment and then write the rest of the code in Python. We did some cleanup of the text by removing things like URLs, punctuation, and hashtags.

                                                                                                                                            With our data prepared, and once again with the help of magic commands to mix languages, we inserted a SQL statement into our notebook to MERGE the data from our Apache Spark™ Dataframe into our bronze Delta Table. With every pull of the Twitter API, there were some tweets duplicated across multiple batches; the MERGE operation allows us to only push new tweets into our table, avoiding duplication.

                                                                                                                                            Classifying & analyzing tweets

                                                                                                                                            For this project, we followed a simplified version of a medallion architecture. In our case, we landed the pre-processed tweets into our bronze table via the MERGE, ran them through our classifier, and then used another MERGE to insert the results into our gold table. This highlights again how MERGE makes it really easy to push high volumes of unique records through a pipeline on top of Delta Lake without having to write complex logic for deduplication.

                                                                                                                                            For the actual tweet classification, we used the pre-trained GloVe vectors (downloaded via Gensim) to construct relevant categories and keywords for classifying each resolution. One really nice thing about the GloVe vectors is that they were trained on over 2 billion tweets worth of data from Twitter. This solved the challenge of us not having enough training data upfront to build our own vectors.

                                                                                                                                            After some discussion, we came up with these categories* as common New Year’s Resolutions themes:

                                                                                                                                            • exercise
                                                                                                                                            • learn something new
                                                                                                                                            • finance
                                                                                                                                            • eco-friendly
                                                                                                                                            • outdoors
                                                                                                                                            • travel
                                                                                                                                            • healthy diet
                                                                                                                                            • reading
                                                                                                                                            • self-care
                                                                                                                                            • quit smoking

                                                                                                                                            * We also had an “other” category for all tweets that didn’t fit into the topics above. We ended up not using the other category for our analysis since a large portion of these tweets consisted of ads, sarcastic or funny comments, trolling, and other irrelevant messages

                                                                                                                                            We came up with a few seed keywords for each category, and then GloVe provided additional keywords that were most relevant to each, giving us a basis to do our classification.

                                                                                                                                            Now that we had each category seeded with a large number of keywords, we ran each tweet through our classifier to determine the dominant category. We did this by counting the number of keywords from each category that appeared in each tweet: whichever category had the largest number of matched keywords is how we classified that tweet.

                                                                                                                                            We executed this process for both the 2015 and 2022 data sets. Using Databricks, we wrote these into a gold Delta Table and were able to quickly develop visualizations in Databricks SQL. This was the final product that was the basis for our analysis, which we’ll dive into below:

                                                                                                                                            A glimpse into the post-pandemic mindset

                                                                                                                                            While the 2015 dataset included human-labeled topics, we executed the above process for both the 2015 and 2022 data sets to classify all of the tweets according to our selected categories in order to get a consistent view.

                                                                                                                                            Now that our data science was complete, it was just a matter of using this data and visualizations to actually extract insights. We performed our analysis and were pretty surprised just how different the two years’ resolutions were. Here’s a summary of our findings:

                                                                                                                                            A growing interest in physical health

                                                                                                                                            “Eating better” and “exercising more” are some of the most stereotypical New Year’s resolutions. But when we compare 2015 and 2022, it’s clear that there’s a more meaningful shift at play.

                                                                                                                                            In 2015, self-care – usually used to describe overall wellbeing with an emphasis on physical and mental behaviors, mindfulness, etc – was the most common New Year’s resolution. This theme still remains strong in 2022, as it was the second most popular resolution.

                                                                                                                                            However, a stark contrast is the increased focus on physical health goals. Pre-pandemic, healthy-diet wasn’t hugely top of mind, accounting for only 12.5% of tweets. Healthy diet nearly doubled in 2022, making it the top resolution on Twitter. This dramatic change makes total sense given the context of the time. For many of us, the pandemic has pushed ideas of health risks and ailments to the top of our minds. While it might not be directly related to COVID-19, it’s not surprising to see people set goals around adopting an overall healthier lifestyle and eating habits.

                                                                                                                                            Less desire to learn

                                                                                                                                            Another noticeable difference between the two years is in the learn something new category, which can really describe anything from picking up a new hobby to acquiring a skill set to just expanding overall knowledge. As you can see, 2015 showed a huge interest in learning something new and ranked #2 in popularity. However, in 2022, that number shrank from 13% to less than 9%, bumping it down to #5.

                                                                                                                                            Like a healthy diet, it’s possible to view this shift as a response to the past two years. In that timeframe, people have had to spend significantly more time at home, often apart from friends and most loved ones. Naturally, without the typical avenues of entertainment and going out, many of us had ample time to explore new avenues and hobbies. But two years in, it’s not surprising to see that people are rather fatigued of ‘learning’ or perhaps have already reached these goals and are ready to commit to something different, such as behaviors to improve health.

                                                                                                                                            Some things never change

                                                                                                                                            It’s important to note that while a lot has changed, a lot has also stayed the same in terms of what people care about and their personal motivations.

                                                                                                                                            One stable New Year’s resolution was reading. While it’s great to see reading as high on the list both years, this was a little surprising given that learning-new experienced such a dip in 2022. However, its ability to remain a top priority in 2022 could be explained by fatigue around connecting online (e.g., Zoom meetings and happy hours) and more time spent online or on streaming services. With this in mind, it seems practical that a lot of people are ready to take breaks from the Internet and explore a different avenue of entertainment.

                                                                                                                                            Another constant that was exciting to see was the consistent focus on self-care. While, as mentioned above, it lost its spot as the #1 resolution, there wasn’t a big change between 2015 and 2021 (22.3% and 19.5%, respectively). Considering the stresses and unknowns since 2019, all we can say is we’re happy to see that people are still prioritizing taking care of their own needs and health.

                                                                                                                                            Conclusion

                                                                                                                                            These are just some of our insights from comparing 2015 and 2022 New Year’s resolutions, but they do suggest a growing shift in our personal goals and interests. Even more so, this use case shows how Dabricks’ Lakehouse truly is a unified platform. Every teammate involved was able to execute every aspect of this use case on Databricks, and do it quickly and collaboratively.

                                                                                                                                            New to Lakehouse? Check out this blog post from our co-founders for an overview of the architecture and how it can be leveraged across data teams.

                                                                                                                                            Keep up with us

                                                                                                                                            Recommended for you

                                                                                                                                            Share this post

                                                                                                                                            Never miss a Databricks post

                                                                                                                                            Subscribe to the categories you care about and get the latest posts delivered to your inbox

                                                                                                                                            Sign up

                                                                                                                                            What's next?

                                                                                                                                            MIT and Databricks Report

                                                                                                                                            Insights

                                                                                                                                            October 16, 2023/3 min read

                                                                                                                                            How CIOs are laying the foundation for AI-led growth

                                                                                                                                            The role of AI in changing company structures and dynamics

                                                                                                                                            Data Strategy

                                                                                                                                            November 12, 2024/9 min read

                                                                                                                                            The role of AI in changing company structures and dynamics

                                                                                                                                            databricks logo
                                                                                                                                            Why Databricks
                                                                                                                                            Discover
                                                                                                                                            • For Executives
                                                                                                                                            • For Startups
                                                                                                                                            • Lakehouse Architecture
                                                                                                                                            • Mosaic Research
                                                                                                                                            Customers
                                                                                                                                            • Featured
                                                                                                                                            • See All
                                                                                                                                            Partners
                                                                                                                                            • Cloud Providers
                                                                                                                                            • Technology Partners
                                                                                                                                            • Data Partners
                                                                                                                                            • Built on Databricks
                                                                                                                                            • Consulting & System Integrators
                                                                                                                                            • C&SI Partner Program
                                                                                                                                            • Partner Solutions
                                                                                                                                            Discover
                                                                                                                                            • For Executives
                                                                                                                                            • For Startups
                                                                                                                                            • Lakehouse Architecture
                                                                                                                                            • Mosaic Research
                                                                                                                                            Customers
                                                                                                                                            • Featured
                                                                                                                                            • See All
                                                                                                                                            Partners
                                                                                                                                            • Cloud Providers
                                                                                                                                            • Technology Partners
                                                                                                                                            • Data Partners
                                                                                                                                            • Built on Databricks
                                                                                                                                            • Consulting & System Integrators
                                                                                                                                            • C&SI Partner Program
                                                                                                                                            • Partner Solutions
                                                                                                                                            Product
                                                                                                                                            Databricks Platform
                                                                                                                                            • Platform Overview
                                                                                                                                            • Sharing
                                                                                                                                            • Governance
                                                                                                                                            • Artificial Intelligence
                                                                                                                                            • Business Intelligence
                                                                                                                                            • Data Management
                                                                                                                                            • Data Warehousing
                                                                                                                                            • Real-Time Analytics
                                                                                                                                            • Data Engineering
                                                                                                                                            • Data Science
                                                                                                                                            Pricing
                                                                                                                                            • Pricing Overview
                                                                                                                                            • Pricing Calculator
                                                                                                                                            Open Source
                                                                                                                                            Integrations and Data
                                                                                                                                            • Marketplace
                                                                                                                                            • IDE Integrations
                                                                                                                                            • Partner Connect
                                                                                                                                            Databricks Platform
                                                                                                                                            • Platform Overview
                                                                                                                                            • Sharing
                                                                                                                                            • Governance
                                                                                                                                            • Artificial Intelligence
                                                                                                                                            • Business Intelligence
                                                                                                                                            • Data Management
                                                                                                                                            • Data Warehousing
                                                                                                                                            • Real-Time Analytics
                                                                                                                                            • Data Engineering
                                                                                                                                            • Data Science
                                                                                                                                            Pricing
                                                                                                                                            • Pricing Overview
                                                                                                                                            • Pricing Calculator
                                                                                                                                            Integrations and Data
                                                                                                                                            • Marketplace
                                                                                                                                            • IDE Integrations
                                                                                                                                            • Partner Connect
                                                                                                                                            Solutions
                                                                                                                                            Databricks For Industries
                                                                                                                                            • Communications
                                                                                                                                            • Financial Services
                                                                                                                                            • Healthcare and Life Sciences
                                                                                                                                            • Manufacturing
                                                                                                                                            • Media and Entertainment
                                                                                                                                            • Public Sector
                                                                                                                                            • Retail
                                                                                                                                            • View All
                                                                                                                                            Cross Industry Solutions
                                                                                                                                            • Customer Data Platform
                                                                                                                                            • Cybersecurity
                                                                                                                                            Data Migration
                                                                                                                                            Professional Services
                                                                                                                                            Solution Accelerators
                                                                                                                                            Databricks For Industries
                                                                                                                                            • Communications
                                                                                                                                            • Financial Services
                                                                                                                                            • Healthcare and Life Sciences
                                                                                                                                            • Manufacturing
                                                                                                                                            • Media and Entertainment
                                                                                                                                            • Public Sector
                                                                                                                                            • Retail
                                                                                                                                            • View All
                                                                                                                                            Cross Industry Solutions
                                                                                                                                            • Customer Data Platform
                                                                                                                                            • Cybersecurity
                                                                                                                                            Resources
                                                                                                                                            Documentation
                                                                                                                                            Customer Support
                                                                                                                                            Community
                                                                                                                                            Training and Certification
                                                                                                                                            • Learning Overview
                                                                                                                                            • Training Overview
                                                                                                                                            • Certification
                                                                                                                                            • University Alliance
                                                                                                                                            • Databricks Academy Login
                                                                                                                                            Events
                                                                                                                                            • Data + AI Summit
                                                                                                                                            • Data + AI World Tour
                                                                                                                                            • Data Intelligence Days
                                                                                                                                            • Full Calendar
                                                                                                                                            Blog and Podcasts
                                                                                                                                            • Databricks Blog
                                                                                                                                            • Databricks Mosaic Research Blog
                                                                                                                                            • Data Brew Podcast
                                                                                                                                            • Champions of Data & AI Podcast
                                                                                                                                            Training and Certification
                                                                                                                                            • Learning Overview
                                                                                                                                            • Training Overview
                                                                                                                                            • Certification
                                                                                                                                            • University Alliance
                                                                                                                                            • Databricks Academy Login
                                                                                                                                            Events
                                                                                                                                            • Data + AI Summit
                                                                                                                                            • Data + AI World Tour
                                                                                                                                            • Data Intelligence Days
                                                                                                                                            • Full Calendar
                                                                                                                                            Blog and Podcasts
                                                                                                                                            • Databricks Blog
                                                                                                                                            • Databricks Mosaic Research Blog
                                                                                                                                            • Data Brew Podcast
                                                                                                                                            • Champions of Data & AI Podcast
                                                                                                                                            About
                                                                                                                                            Company
                                                                                                                                            • Who We Are
                                                                                                                                            • Our Team
                                                                                                                                            • Databricks Ventures
                                                                                                                                            • Contact Us
                                                                                                                                            Careers
                                                                                                                                            • Open Jobs
                                                                                                                                            • Working at Databricks
                                                                                                                                            Press
                                                                                                                                            • Awards and Recognition
                                                                                                                                            • Newsroom
                                                                                                                                            Security and Trust
                                                                                                                                            Company
                                                                                                                                            • Who We Are
                                                                                                                                            • Our Team
                                                                                                                                            • Databricks Ventures
                                                                                                                                            • Contact Us
                                                                                                                                            Careers
                                                                                                                                            • Open Jobs
                                                                                                                                            • Working at Databricks
                                                                                                                                            Press
                                                                                                                                            • Awards and Recognition
                                                                                                                                            • Newsroom
                                                                                                                                            databricks logo

                                                                                                                                            Databricks Inc.
                                                                                                                                            160 Spear Street, 15th Floor
                                                                                                                                            San Francisco, CA 94105
                                                                                                                                            1-866-330-0121

                                                                                                                                            See Careers
                                                                                                                                            at Databricks

                                                                                                                                            © Databricks 2025. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the Apache Software Foundation.

                                                                                                                                            • Privacy Notice
                                                                                                                                            • |Terms of Use
                                                                                                                                            • |Modern Slavery Statement
                                                                                                                                            • |California Privacy
                                                                                                                                            • |Your Privacy Choices