Cluster Locations Using K-MeansΒΆ

The function takes a CSV of data collection locations and clusters locations into n groups, where n is specified by the user. n can be n_workers, n_workers*n_days etc. The function uses k-means to cluster the locations, and defaults to the euclidean distance matrix.

Input:

  • n or number of clusters
  • A CSV file containing the lat/longs of the remaining points. For instance, output of geo_sampling with a few adjustments can work. The geo_sampling script produces a list of segments with a start and an end lat/long of each segment. Pick the start or end lat/long and rename columns so that they are 3 columns with names: id, lat, long
  • Distance function: -d euclidean, -d haversine or -d osrm. Default is -d euclidean.
  • Name of the output file. Default is cluster-kmeans-output.csv

Output:

  • Appends a new column assigned_points that gives cluster assignment for each row of the lat/long file. The column takes integer values: 1, 2, 3, ....n

Usage:

usage: cluster_kmeans.py [-h] -n N_WORKERS [-m MAX_ITER]
                        [-d {euclidean,haversine,osrm}] [-c CENTROIDS]
                        [-o OUTPUT] [-r RANDOM_STATE] [--plot]
                        [--osrm-base-url OSRM_BASE_URL]
                        [--osrm-max-table-size OSRM_MAX_TABLE_SIZE]
                        input

Random allocator based on K-Means clustering

positional arguments:
  input                 Road segments input file

optional arguments:
  -h, --help            show this help message and exit
  -n N_WORKERS, --n_workers N_WORKERS
                        Number of workers
  -m MAX_ITER, --max_iter MAX_ITER
                        Maximum number of iterations
  -d {euclidean,haversine,osrm}, --distance-func {euclidean,haversine,osrm}
                        Distance function for distance matrix
  -c CENTROIDS, --centroids CENTROIDS
                        Output file name of K-Means centroids
  -o OUTPUT, --output OUTPUT
                        Output file name
  -r RANDOM_STATE, --random-state RANDOM_STATE
                        Random state
  --plot                Plot the output
  --osrm-base-url OSRM_BASE_URL
                        Custom OSRM service URL
  --osrm-max-table-size OSRM_MAX_TABLE_SIZE
                        Maximum OSRM table size

Examples:

python -m allocator.cluster_kmeans -n 10 allocator/examples/chonburi-roads-1k.csv --plot

Output file will be saved as cluster-kmeans-output.csv if a different name is not specified by -o/--output. K-means centroids will be saved as cluster-kmeans-centroids-output.csv if another name is not specified by -c/--centroids

To see the plot, please specify --plot

_images/cluster-kmeans-output.png