My Project of Youtube (Part 2)


A little bit about my YouTube project

Published on December 08, 2023 by Jeffry Troll

data science YouTube API

4 min READ

This is a walk-through project video on data viz and streamlit that helps a lot.

Introduction:

As I mentioned before, YouTube has been an important part of the day for me and my wife, we use it to learn and share time. We were planning to start our own YouTube channel about cooking and food, so I started this little project to better understand what successful cooking YouTubers have done, so we have a notion about how to do it.

Here is the repository of the project

Vizulizations:

As part of the initial exploratory data analysis (EDA), I began by loading the dataset extracted from the YouTube API. However, I noticed that the first column, “Unnamed: 0,” was unnecessary, so I removed it. Additionally, I added two new columns to the dataset: “engagement,” calculated as the sum of comment_count and like_count divided by view_count, and “collaboration,” which is binary (1 or 0) to indicate whether a video involved collaboration.

  
# Dropping unnecessary column
df = df.drop('Unnamed: 0', axis=1)

# Adding the 'engagement' and 'collaboration' columns
df['engagement'] = (df['comment_count'] + df['like_count']) / df['view_count']
df['collaboration'] = np.where(df['collaboration'] == True, 1, 0)

  

I also noticed that some data from the years 2014 to 2017 had limited representation, so I decided to focus on data from 2018 onwards for a more relevant analysis.

  
df_year_cleaned = df[df['year'] >= 2018]
  

To understand the distribution of video lengths, I created a histogram.

Test Image

The histogram revealed the presence of both regular YouTube videos and Shorts, with Shorts typically lasting around 30 seconds. To classify these videos, I introduced a “short” column based on a 1-minute threshold.

  
df_year_cleaned['short'] = np.where(df_year_cleaned['duration_in_minutes'] <= 1, 1, 0)
  

I also examined the distribution of title lengths.

Test Image

The graph shows that it follows a normal distribution, I will consider that as something interesting

Now let’s dive into the actually fun part

Do you think there’s a relationship between likes and views? Let’s see

Test Image

I’m not sure, but it kind of looks linear, so let’s see something better

Test Image

It’s interesting to see the different slops for each YouTuber but it’s indeed linear

What about the relationship between the length of the videos and their view?

Test Image

Not quite as good as the likes and views, let come back again to the views but in a boxplot now

Test Image

That’s cool, it confirms that Nick DiGiovanni is a bigger YouTuber. Also, I like those boxplots, so let’s go to check the views by year

Test Image

It looks like the bump of Covid is ending and fewer people are spending time on YouTube, or we need to wait until the year 2023 finishes, or people are getting less excited about Food and Cooking

What about the relationship between their submission day and the view? Let’s find out

Test Image

I’m not surprised that videos that were uploaded during the weekend get more views, so this confirms my theory So, is there any correlation then?

Test Image

In summary, the strongest correlation between likes and views (0.83) indicates a strong positive relationship. Views and comments have a moderate positive correlation (0.37), while duration in minutes has a weak negative correlation (-0.22) with views. Short has a moderate positive correlation (0.31) with views, and collaboration has a very weak positive correlation (0.099) with views.

Key Insights:

Test Image

Video Lengths: For our cooking channel, we should aim for regular videos to be at least 10 minutes long, and Shorts should be around 30 seconds.

Test Image

Title Length: Titles for regular videos should ideally be around 50 characters, while Shorts can have shorter titles, approximately 20 characters.

Test Image

Shorts Impact: Shorts tend to receive a higher engagement rate on average for all three channels, suggesting that creating Shorts can be an effective strategy to engage the audience.

Test Image

Likes and Views Relationship: There is a strong positive correlation (0.83) between likes and views, indicating that more likes generally lead to more views.

Test Image

Views by Year: The graph shows a decline in views in 2021 and 2022, possibly suggesting that the surge in YouTube viewership during the COVID-19 pandemic may be subsiding.

Test Image

Views by Day of the Week: Videos uploaded during the weekend tend to receive more views, highlighting the importance of strategic video publishing.

Test Image

Test Image

Test Image

Consistency in Uploads: Successful YouTubers follow a consistent pattern in uploading videos, which can help maintain viewer engagement.

Conclusion:

This project has provided valuable insights into the world of successful cooking YouTube channels. While it didn’t reveal groundbreaking discoveries, it has equipped us with a solid foundation to start our own channel. We’ve also created a web application to explore the data further. https://youtube-project-jt.streamlit.app/