3  Data Preparation

3.1 Overview

Raw Ookla tiles cover the entire world. This chapter filters to the US, converts speeds to Mbps, derives time variables, and builds a clean analysis table in DuckDB.

3.2 Setup

3.3 What is a Quadkey?

Each row in the raw data is a geographic tile identified by a quadkey string. To filter to the US we need tile geometries. The quadkey package decodes these to bounding boxes.

[1] 18830917
Rows in ookla_us: 18830917

3.4 Speed Distribution

   year quarter   tiles min_dl mean_dl median_dl max_dl
1  2022       1 1618733      0    67.5      33.5 3942.9
2  2022       2 1680185      0    69.1      34.2 4079.6
3  2022       3 1683897      0    71.9      35.4 4073.0
4  2022       4 1626972      0    78.6      38.9 4036.3
5  2023       1 1571120      0    86.4      43.9 4046.7
6  2023       2 1576271      0    95.6      50.0 3961.8
7  2023       3 1634919      0   101.2      52.7 4018.8
8  2023       4 1557569      0   110.4      57.9 4127.1
9  2024       1 1507360      0   116.2      62.0 4069.9
10 2024       2 1465318      0   116.9      63.4 4200.9
11 2024       3 1494384      0   116.7      63.8 4601.5
12 2024       4 1414189      0   122.3      69.1 4554.0

3.5 Removing Outliers

Speed tests above 2,000 Mbps on mobile are implausible (likely fixed broadband mislabeled). We cap at the 99th percentile per period.

[1] 18642596
Rows after outlier removal: 18642596

3.6 Period Labels

   year quarter period_start
1  2022       1   2022-01-01
2  2022       2   2022-04-01
3  2022       3   2022-07-01
4  2022       4   2022-10-01
5  2023       1   2023-01-01
6  2023       2   2023-04-01
7  2023       3   2023-07-01
8  2023       4   2023-10-01
9  2024       1   2024-01-01
10 2024       2   2024-04-01
11 2024       3   2024-07-01
12 2024       4   2024-10-01

3.7 Problems

  1. What fraction of rows were removed by the tests >= 3 filter? Is that filter appropriate — what is the tradeoff?

  2. Plot a histogram of avg_d_mbps before and after outlier removal for one quarter of your choice.

  3. Look up the Bing Maps quadkey spec. What geographic area does zoom level 16 cover? How does that compare to a typical cell tower coverage radius (~1–2 km)?