Kelaberetiv

kəˈlabərətɪv/ 
adjective: collaborative

Bringing together the AI community in Singapore – companies, startups, researchers, students, professionals – to collaborate, find research and business opportunities and talent.

Have an account? Login
New user? Register
Forgot password? Reset

AI Singapore uses a 2FA login mechanism to protect your account. Learn more about it here.

Your Programmes:

Login to see the programmes are subscribed to.

Your Files:

Login to download your files.

Opportunities

  • Have an interesting story to share?
  • Seeking for AI talent for your organization?
  • Seeking research interns for your labs?
  • Seeking an industry partner for your AI projects?
  • Are you a researcher and seeking an industry partner to do a POC or deployment of your IP/research outcomes?

pandas Foundations  

  RSS

Liyi Ang
(@liyi)
Member
Joined: 6 months ago
Posts: 52
November 21, 2018 9:06 pm  

Do you have any questions relating to pandas Foundations? Leave them here!


Quote
hanqi
(@hanqi)
Active Member
Joined: 4 months ago
Posts: 13
December 8, 2018 12:21 am  

From 4. Case study – sunlight in Austin -  Daily hours of clear sky, the exercise approach is not correct? Data for this exercise has too many rows on some days and too little rows on others. The question was to do boxplot showing fractions of each day having clear sky. df_clean has 10337 rows (representing 1 year of hourly data as introduced by the tutor, note that this number is higher than 365*24, so first alarm here), and after  is_sky_clear.resample('D').count(), many days have more than 24 entries (the sea_level_pressure column of these days have ‘M’ representing missing) and 1 of the days has only 18 entries.

The exercise tried to do is_sky_clear.resample('D').sum()/is_sky_clear.resample('D').count()  to get that fraction of each day having clear sky, but seems oblivious to the fact that each hour does not necessarily have only 1 (or even any) row representing it, leaving the fraction of the day being sunny biased by the imbalanced number of datapoints within every hour.
(…continued) How do we answer this question of what fraction of each day has clear sky correctly then? By eliminating days that did not have exactly 24 rows in is_sky_clear.resample('D').count() ? How important is it to design fixed time interval when collecting time series data?


ReplyQuote
Share:
Do NOT follow this link or you will be banned from the site!
  
Working

Please Login or Register