Bringing together the AI community in Singapore – companies, startups, researchers, students, professionals – to collaborate, find research and business opportunities and talent.
- Have an interesting story to share?
- Seeking for AI talent for your organization?
- Seeking research interns for your labs?
- Seeking an industry partner for your AI projects?
- Are you a researcher and seeking an industry partner to do a POC or deployment of your IP/research outcomes?
Statistical Thinking in Python (Part 2)
Do you have any questions relating to Statistical Thinking in Python (Part 2)? Leave them here!
Under Bootstrap confidence intervals - Visualizing bootstrap samples.
We plot ecdf of 50 sets of bootstrapped samples on top of the ecdf of the original data.
Why does the bootstrapped ecdf look thicker in the middle region compared to the ends? (You can exaggerate this effect by doing 300 instead of 50 iterations of data generation and plotting).
Can i conclude that np.random.choice is not uniformly selecting from the original data, and is actually selecting from the 700- 900 mm rainfall section more frequently?
Or this conclusion is false because
1) the thinner gray ecdfs closer to the edges (rainfall < 600mm or rainfall > 1000 mm) could actually contain the same number of gray points, but they are just overlaying on top of each other so i can't see.
2) the vertical position of each gray point represents the order in which it appears in the sorted data, numbers in the middle have "much more room to move around" in their order compared to numbers at the edge of the range.
What can we do to the bootstrapped samples to prove quantitatively that np.random.choice is indeed uniform?
Why do one-tailed rather than two-tailed tests? Is one-tail done because there is actually a minimum or maximum bound (whether natural or artificial/financial/practical) that makes consideration of possible values occurring in the opposite tail (the bounded side) irrelevant? Or could it be that there is no bound but the experimenter simply doesn’t care about one side? If yes why/how so? Any examples of where information in 1-direction is more valuable than information in the opposite direction? (This is what I infer when I see 1 tail rather than 2 tail)
In reality, are there more symmetric distributions or asymmetric distributions? Doesn’t the symmetry depend on the measurement calibration/units? If I decide nothing can be below 0, the distribution will cut at 0 and have only a right tail with no left tail, making symmetry impossible. If I allow negative numbers to be considered, the left tail appears. I feel this is important to understand because existence of tails will affect whether one-tail/two tail tests are used.