API Reference

DpMobilityReport

class dp_mobility_report.dpmreport.DpMobilityReport(df: DataFrame, tessellation: GeoDataFrame | None = None, privacy_budget: int | float | None = None, user_privacy: bool = True, max_trips_per_user: int | None = None, analysis_selection: List[str] | None = None, analysis_exclusion: List[str] | None = None, budget_split: dict = {}, timewindows: List[int] | ndarray = [2, 6, 10, 14, 18, 22], max_travel_time: int | None = None, bin_range_travel_time: int | None = None, max_jump_length: int | float | None = None, bin_range_jump_length: int | float | None = None, max_radius_of_gyration: int | float | None = None, bin_range_radius_of_gyration: int | float | None = None, max_user_tile_count: int | None = None, bin_range_user_tile_count: int | None = None, max_user_time_delta: int | float | None = None, bin_range_user_time_delta: int | float | None = None, subtitle: str | None = None, disable_progress_bar: bool = False, seed_sampling: int | None = None, evalu: bool = False)[source]

Generate a (differentially private) mobility report from a mobility dataset. The report will be generated as an HTML file, using the .to_file() method.

Parameters:

df – DataFrame containing the mobility data. Expected columns: User ID uid, trip ID tid, timestamp datetime (or int to indicate sequence position, if dataset only consists of sequences without timestamps), latitude lat and longitude lng in CRS EPSG:4326.
tessellation – Geopandas GeoDataFrame containing the tessellation for spatial aggregations. Expected columns: tile_id. If tessellation is not provided in the expected default CRS EPSG:4326, it will automatically be transformed. If no tessellation is provided, all analyses based on the tessellation will automatically be removed.
privacy_budget – privacy_budget for the differentially private report. Defaults to None, i.e., no privacy guarantee is provided.
user_privacy – Whether item-level or user-level privacy is applied. Defaults to True (user-level privacy).
max_trips_per_user – Maximum number of trips a user is allowed to contribute to the data. Dataset will be sampled accordingly. Defaults to None, i.e., all trips are used. This implies that the actual maximum number of trips per user will be used according to the data, though this violates user-level Differential Privacy.
analysis_selection – Select only needed analyses. A selection reduces computation time and leaves more privacy budget for higher accuracy of other analyses. analysis_selection takes a list of all analyses to be included. Alternatively, a list of analyses to be excluded can be set with analysis_exclusion. Either entire segments can be included: const.OVERVIEW, const.PLACE_ANALYSIS, const.OD_ANALYSIS, const.USER_ANALYSIS or any single analysis can be included: const.DS_STATISTICS, const.MISSING_VALUES, const.TRIPS_OVER_TIME, const.TRIPS_PER_WEEKDAY, const.TRIPS_PER_HOUR, const.VISITS_PER_TILE, const.VISITS_PER_TIME_TILE, const.OD_FLOWS, const.TRAVEL_TIME, const.JUMP_LENGTH, const.TRIPS_PER_USER, const.USER_TIME_DELTA, const.RADIUS_OF_GYRATION, const.USER_TILE_COUNT, const.MOBILITY_ENTROPY. Default is None, i.e., all analyses are included.
analysis_exclusion – Ignored, if analysis_selection is set! analysis_exclusion takes a list of all analyses to be excluded. Either entire segments can be excluded: const.OVERVIEW, const.PLACE_ANALYSIS, const.OD_ANALYSIS, const.USER_ANALYSIS or any single analysis can be excluded: const.DS_STATISTICS, const.MISSING_VALUES, const.TRIPS_OVER_TIME, const.TRIPS_PER_WEEKDAY, const.TRIPS_PER_HOUR, const.VISITS_PER_TILE, const.VISITS_PER_TIME_TILE, const.OD_FLOWS, const.TRAVEL_TIME, const.JUMP_LENGTH, const.TRIPS_PER_USER, const.USER_TIME_DELTA, const.RADIUS_OF_GYRATION, const.USER_TILE_COUNT, const.MOBILITY_ENTROPY
budget_split – dict to customize how much privacy budget is assigned to which analysis. Each key needs to be named according to an analysis and the value needs to be an integer indicating the weight for the privacy budget. If no weight is assigned, a default weight of 1 is set. For example, if budget_split = {const.VISITS_PER_TILE: 10}, then the privacy budget for visits_per_tile is 10 times higher than for every other analysis, which all get a default weight of 1. Possible dict keys (all analyses): const.DS_STATISTICS, const.MISSING_VALUES, const.TRIPS_OVER_TIME, const.TRIPS_PER_WEEKDAY, const.TRIPS_PER_HOUR, const.VISITS_PER_TILE, const.VISITS_PER_TIME_TILE, const.OD_FLOWS, const.TRAVEL_TIME, const.JUMP_LENGTH, const.TRIPS_PER_USER, const.USER_TIME_DELTA, const.RADIUS_OF_GYRATION, const.USER_TILE_COUNT, const.MOBILITY_ENTROPY
timewindows – List of hours as int that define the timewindows for the spatial analysis for single time windows. Defaults to [2, 6, 10, 14, 18, 22].
max_travel_time – Upper bound for travel time histogram. If None is given, no upper bound is set. Defaults to None.
bin_range_travel_time – The range a single histogram bin spans for travel time (e.g., 5 for 5 min bins). If None is given, the histogram bins will be determined automatically. Defaults to None.
max_jump_length – Upper bound for jump length histogram. If None is given, no upper bound is set. Defaults to None.
bin_range_jump_length – The range a single histogram bin spans for jump length (e.g., 1 for 1 km bins). If None is given, the histogram bins will be determined automatically. Defaults to None.
max_radius_of_gyration – Upper bound for radius of gyration histogram. If None is given, no upper bound is set. Defaults to None.
bin_range_radius_of_gyration – The range a single histogram bin spans for the radius of gyration (e.g., 1 for 1 km bins). If None is given, the histogram bins will be determined automatically. Defaults to None.
max_user_tile_count – Upper bound for distinct tiles per user histogram. If None is given, no upper bound is set. Defaults to None.
bin_range_user_tile_count – The range a single histogram bin spans for the distinct tiles per user histogram. If None is given, the histogram bins will be determined automatically. Defaults to None.
max_user_time_delta – Upper bound for user time delta histogram. If None is given, no upper bound is set. Defaults to None.
bin_range_user_time_delta – The range a single histogram bin spans for user time delta (e.g., 1 for 1 hour bins). If None is given, the histogram bins will be determined automatically. Defaults to None.
subtitle – Custom subtitle that appears at the top of the HTML report. Defaults to None.
disable_progress_bar – Whether progress bars should be shown. Defaults to False.
seed_sampling – Provide seed for down-sampling of dataset (according to max_trips_per_user) so that the sampling is reproducible. Defaults to None, i.e., no seed.
evalu – Parameter only needed for development and evaluation purposes. Defaults to False.

property analysis_exclusion: list: List of analyses that have been excluded from the report and similarity measures. If analysis selection was provided as a parameter, they are inverted to this analysis_exclusion parameter.

property budget_split: dict: Budget split as specified in the parameters.

property df: DataFrame: DataFrame containing the processed input mobility data of the report.

property max_trips_per_user: int: Maximum number of trips per user as specified in the parameters. If None was given, this equals the actual maximum according to the data.

property privacy_budget: int | float: Privacy budget as specified in the parameters.

property report: dict: A dictionary with all report elements (i.e., analyses).

property tessellation: GeoDataFrame: Processed tessellation.

to_file(output_file: str | Path, disable_progress_bar: bool | None = None, top_n_flows: int = 100) → None[source]

Write the report to a file. By default a name is generated.

Parameters:

output_file – The name or the path of the file to store the html output.
disable_progress_bar – if False, no progress bar is shown.
top_n_flows – Determines how many of the top n origin-destination flows are displayed. Defaults to 100.

BenchmarkReport

class dp_mobility_report.benchmark.benchmarkreport.BenchmarkReport(df_base: DataFrame, tessellation: GeoDataFrame | None = None, df_alternative: DataFrame | None = None, privacy_budget_base: int | float | None = None, privacy_budget_alternative: int | float | None = None, user_privacy_base: bool = True, user_privacy_alternative: bool = True, max_trips_per_user_base: int | None = None, max_trips_per_user_alternative: int | None = None, analysis_selection: List[str] | None = None, analysis_exclusion: List[str] | None = None, budget_split_base: dict = {}, budget_split_alternative: dict = {}, timewindows: List[int] | ndarray = [2, 6, 10, 14, 18, 22], max_travel_time: int = 120, bin_range_travel_time: int = 5, max_jump_length: int | float = 10, bin_range_jump_length: int | float = 1, max_radius_of_gyration: int | float = 5, bin_range_radius_of_gyration: int | float = 0.5, max_user_tile_count: int = 10, bin_range_user_tile_count: int = 1, max_user_time_delta: int | float = 48, bin_range_user_time_delta: int | float = 4, top_n_ranking: List[int] = [10, 50, 100], measure_selection: dict | None = None, subtitle: str | None = None, disable_progress_bar: bool = False, seed_sampling: int | None = None, evalu: bool = False)[source]

Evaluate the similarity of two (differentially private) mobility reports from one or two mobility datasets.: This can be based on two datasets (df_base and df_alternative) or one dataset (df_base) with different privacy settings. The arguments df, privacy_budget, user_privacy, max_trips_per_user and budget_split can differ for the two datasets set with the according ending _base and _alternative. The other arguments are the same for both reports. For the evaluation, similarity measures (namely the symmetric mean absolute percentage error (SMAPE), Jensen-Shannon divergence (JSD), Kullback-Leibler divergence (KLD), the earth mover’s distance (EMD), the Kendall correlation coefficient (KT) and the top n coverage (TOP_N_COV)) are computed to quantify the statistical similarity for each analysis. The evaluation, i.e., benchmark report, will be generated as an HTML file, using the .to_file() method.

Parameters:

df_base – DataFrame containing the baseline mobility data, see argument df of DpMobilityReport.
tessellation – Geopandas GeoDataFrame containing the tessellation for spatial aggregations. Expected columns: tile_id. If tessellation is not provided in the expected default CRS EPSG:4326 it will automatically be transformed. If no tessellation is provided, all analyses based on the tessellation will automatically be removed.
df_alternative – DataFrame containing the alternative mobility data to be compared against the baseline dataset, see argument df of DpMobilityReport. If None, df_base is used for both reports.
privacy_budget_base – privacy_budget for the differentially private base report. Defaults to None, i.e., no privacy guarantee is provided.
privacy_budget_alternative – privacy_budget for the differentially private alternative report. Defaults to None, i.e., no privacy guarantee is provided.
user_privacy_base – Whether item-level or user-level privacy is applied for the base report. Defaults to True (user-level privacy).
user_privacy_alternative – Whether item-level or user-level privacy is applied for the alternative report. Defaults to True (user-level privacy).
max_trips_per_user_base – maximum number of trips a user shall contribute to the data. Dataset will be sampled accordingly. Defaults to None, i.e., all trips included.
max_trips_per_user_alternative – maximum number of trips a user shall contribute to the data. Dataset will be sampled accordingly. Defaults to None, i.e., all trips included.
analysis_selection – Select only needed analyses, see argument analysis_selection of DpMobilityReport.
analysis_exclusion – Ignored, if analysis_selection is set! Exclude analyses that are not needed, see argument ``analysis_exclusion`` of DpMobilityReport.
budget_split_base – dict``to customize how much privacy budget is assigned to which analysis. See argument ``budget_split of DpMobilityReport.
budget_split_alternative – dict``to customize how much privacy budget is assigned to which analysis. See argument ``budget_split of DpMobilityReport.
timewindows – List of hours as int that define the timewindows for the spatial analysis for single time windows. Defaults to [2, 6, 10, 14, 18, 22].
max_travel_time – Upper bound for travel time histogram. Defaults to 120 (mins).
bin_range_travel_time – The range a single histogram bin spans for travel time (e.g., 5 for 5 min bins). Defaults to 5 (min).
max_jump_length – Upper bound for jump length histogram. Defaults to 10 (km).
bin_range_jump_length – The range a single histogram bin spans for jump length (e.g., 1 for 1 km bins). Defaults to 1 (km).
max_radius_of_gyration – Upper bound for radius of gyration histogram. Defaults to 5 (km).
bin_range_radius_of_gyration – The range a single histogram bin spans for the radius of gyration (e.g., 1 for 1 km bins). Defaults to 0.5 (km).
max_user_tile_count – Upper bound for distinct tiles per user histogram. Defaults to 10.
bin_range_user_tile_count – The range a single histogram bin spans for the distinct tiles per user histogram. Defaults to 1.
max_user_time_delta – Upper bound for user time delta histogram. Defaults to 48 (hours).
bin_range_user_time_delta – The range a single histogram bin spans for user time delta (e.g., 1 for 1 hour bins). Defaults to 4 (hours).
top_n_ranking – List of ‘top n’ values that are used to compute the Kendall correlation coefficient and the top n coverage for ranking similarity measures. Values need to be integers > 0. Defaults to [10, 50, 100].
measure_selection – Select similarity measure for each analysis that is used for the similarity_measures property of the BenchmarkReport. If None, the default from default_measure_selection() will be used.
subtitle – Custom subtitle that appears at the top of the HTML report. Defaults to None.
disable_progress_bar – Whether progress bars should be shown. Defaults to False.
seed_sampling – Provide seed for down-sampling of dataset (according to max_trips_per_user) so that the sampling is reproducible. Defaults to None, i.e., no seed.
evalu (bool, optional) – Parameter only needed for development and evaluation purposes. Defaults to False.

property emd: dict: The earth mover’s distance between base and alternative of all selected analyses, where applicable.

property jsd: dict: The Jensen-Shannon divergence between base and alternative of all selected analyses, where applicable.

property kld: dict: The Kullback-Leibler divergence between base and alternative of all selected analyses, where applicable.

property measure_selection: dict: The specified selected similarity measure for each analysis.

property report_alternative: DpMobilityReport: The alternative DpMobilityReport

property report_base: DpMobilityReport: The base DpMobilityReport

property similarity_measures: dict: Similarity measures according to measure_selection.

property smape: dict: The symmetric (mean absolute) percentage error, based on the relative error, between base and alternative of all selected analyses, where applicable.