[1] 18830917
Rows in ookla_us: 18830917
Raw Ookla tiles cover the entire world. This chapter filters to the US, converts speeds to Mbps, derives time variables, and builds a clean analysis table in DuckDB.
Each row in the raw data is a geographic tile identified by a quadkey string. To filter to the US we need tile geometries. The quadkey package decodes these to bounding boxes.
[1] 18830917
Rows in ookla_us: 18830917
year quarter tiles min_dl mean_dl median_dl max_dl
1 2022 1 1618733 0 67.5 33.5 3942.9
2 2022 2 1680185 0 69.1 34.2 4079.6
3 2022 3 1683897 0 71.9 35.4 4073.0
4 2022 4 1626972 0 78.6 38.9 4036.3
5 2023 1 1571120 0 86.4 43.9 4046.7
6 2023 2 1576271 0 95.6 50.0 3961.8
7 2023 3 1634919 0 101.2 52.7 4018.8
8 2023 4 1557569 0 110.4 57.9 4127.1
9 2024 1 1507360 0 116.2 62.0 4069.9
10 2024 2 1465318 0 116.9 63.4 4200.9
11 2024 3 1494384 0 116.7 63.8 4601.5
12 2024 4 1414189 0 122.3 69.1 4554.0
Speed tests above 2,000 Mbps on mobile are implausible (likely fixed broadband mislabeled). We cap at the 99th percentile per period.
[1] 18642596
Rows after outlier removal: 18642596
year quarter period_start
1 2022 1 2022-01-01
2 2022 2 2022-04-01
3 2022 3 2022-07-01
4 2022 4 2022-10-01
5 2023 1 2023-01-01
6 2023 2 2023-04-01
7 2023 3 2023-07-01
8 2023 4 2023-10-01
9 2024 1 2024-01-01
10 2024 2 2024-04-01
11 2024 3 2024-07-01
12 2024 4 2024-10-01
What fraction of rows were removed by the tests >= 3 filter? Is that filter appropriate — what is the tradeoff?
Plot a histogram of avg_d_mbps before and after outlier removal for one quarter of your choice.
Look up the Bing Maps quadkey spec. What geographic area does zoom level 16 cover? How does that compare to a typical cell tower coverage radius (~1–2 km)?