API Reference
DpMobilityReport
- class dp_mobility_report.dpmreport.DpMobilityReport(df: DataFrame, tessellation: GeoDataFrame | None = None, privacy_budget: int | float | None = None, user_privacy: bool = True, max_trips_per_user: int | None = None, analysis_selection: List[str] | None = None, analysis_exclusion: List[str] | None = None, budget_split: dict = {}, timewindows: List[int] | ndarray = [2, 6, 10, 14, 18, 22], max_travel_time: int | None = None, bin_range_travel_time: int | None = None, max_jump_length: int | float | None = None, bin_range_jump_length: int | float | None = None, max_radius_of_gyration: int | float | None = None, bin_range_radius_of_gyration: int | float | None = None, max_user_tile_count: int | None = None, bin_range_user_tile_count: int | None = None, max_user_time_delta: int | float | None = None, bin_range_user_time_delta: int | float | None = None, subtitle: str | None = None, disable_progress_bar: bool = False, seed_sampling: int | None = None, evalu: bool = False)[source]
Generate a (differentially private) mobility report from a mobility dataset. The report will be generated as an HTML file, using the
.to_file()method.- Parameters:
df –
DataFramecontaining the mobility data. Expected columns: User IDuid, trip IDtid, timestampdatetime(orintto indicate sequence position, if dataset only consists of sequences without timestamps), latitudelatand longitudelngin CRS EPSG:4326.tessellation – Geopandas
GeoDataFramecontaining the tessellation for spatial aggregations. Expected columns:tile_id. If tessellation is not provided in the expected default CRS EPSG:4326, it will automatically be transformed. If no tessellation is provided, all analyses based on the tessellation will automatically be removed.privacy_budget – privacy_budget for the differentially private report. Defaults to
None, i.e., no privacy guarantee is provided.user_privacy – Whether item-level or user-level privacy is applied. Defaults to
True(user-level privacy).max_trips_per_user – Maximum number of trips a user is allowed to contribute to the data. Dataset will be sampled accordingly. Defaults to
None, i.e., all trips are used. This implies that the actual maximum number of trips per user will be used according to the data, though this violates user-level Differential Privacy.analysis_selection – Select only needed analyses. A selection reduces computation time and leaves more privacy budget for higher accuracy of other analyses.
analysis_selectiontakes a list of all analyses to be included. Alternatively, a list of analyses to be excluded can be set withanalysis_exclusion. Either entire segments can be included:const.OVERVIEW,const.PLACE_ANALYSIS,const.OD_ANALYSIS,const.USER_ANALYSISor any single analysis can be included:const.DS_STATISTICS,const.MISSING_VALUES,const.TRIPS_OVER_TIME,const.TRIPS_PER_WEEKDAY,const.TRIPS_PER_HOUR,const.VISITS_PER_TILE,const.VISITS_PER_TIME_TILE,const.OD_FLOWS,const.TRAVEL_TIME,const.JUMP_LENGTH,const.TRIPS_PER_USER,const.USER_TIME_DELTA,const.RADIUS_OF_GYRATION,const.USER_TILE_COUNT,const.MOBILITY_ENTROPY. Default is None, i.e., all analyses are included.analysis_exclusion – Ignored, if
analysis_selectionis set!analysis_exclusiontakes a list of all analyses to be excluded. Either entire segments can be excluded:const.OVERVIEW,const.PLACE_ANALYSIS,const.OD_ANALYSIS,const.USER_ANALYSISor any single analysis can be excluded:const.DS_STATISTICS,const.MISSING_VALUES,const.TRIPS_OVER_TIME,const.TRIPS_PER_WEEKDAY,const.TRIPS_PER_HOUR,const.VISITS_PER_TILE,const.VISITS_PER_TIME_TILE,const.OD_FLOWS,const.TRAVEL_TIME,const.JUMP_LENGTH,const.TRIPS_PER_USER,const.USER_TIME_DELTA,const.RADIUS_OF_GYRATION,const.USER_TILE_COUNT,const.MOBILITY_ENTROPYbudget_split –
dictto customize how much privacy budget is assigned to which analysis. Each key needs to be named according to an analysis and the value needs to be an integer indicating the weight for the privacy budget. If no weight is assigned, a default weight of 1 is set. For example, ifbudget_split = {const.VISITS_PER_TILE: 10}, then the privacy budget forvisits_per_tileis 10 times higher than for every other analysis, which all get a default weight of 1. Possibledictkeys (all analyses):const.DS_STATISTICS,const.MISSING_VALUES,const.TRIPS_OVER_TIME,const.TRIPS_PER_WEEKDAY,const.TRIPS_PER_HOUR,const.VISITS_PER_TILE,const.VISITS_PER_TIME_TILE,const.OD_FLOWS,const.TRAVEL_TIME,const.JUMP_LENGTH,const.TRIPS_PER_USER,const.USER_TIME_DELTA,const.RADIUS_OF_GYRATION,const.USER_TILE_COUNT,const.MOBILITY_ENTROPYtimewindows – List of hours as
intthat define the timewindows for the spatial analysis for single time windows. Defaults to[2, 6, 10, 14, 18, 22].max_travel_time – Upper bound for travel time histogram. If
Noneis given, no upper bound is set. Defaults toNone.bin_range_travel_time – The range a single histogram bin spans for travel time (e.g., 5 for 5 min bins). If
Noneis given, the histogram bins will be determined automatically. Defaults toNone.max_jump_length – Upper bound for jump length histogram. If
Noneis given, no upper bound is set. Defaults toNone.bin_range_jump_length – The range a single histogram bin spans for jump length (e.g., 1 for 1 km bins). If
Noneis given, the histogram bins will be determined automatically. Defaults toNone.max_radius_of_gyration – Upper bound for radius of gyration histogram. If
Noneis given, no upper bound is set. Defaults toNone.bin_range_radius_of_gyration – The range a single histogram bin spans for the radius of gyration (e.g., 1 for 1 km bins). If
Noneis given, the histogram bins will be determined automatically. Defaults toNone.max_user_tile_count – Upper bound for distinct tiles per user histogram. If
Noneis given, no upper bound is set. Defaults toNone.bin_range_user_tile_count – The range a single histogram bin spans for the distinct tiles per user histogram. If
Noneis given, the histogram bins will be determined automatically. Defaults toNone.max_user_time_delta – Upper bound for user time delta histogram. If
Noneis given, no upper bound is set. Defaults toNone.bin_range_user_time_delta – The range a single histogram bin spans for user time delta (e.g., 1 for 1 hour bins). If
Noneis given, the histogram bins will be determined automatically. Defaults toNone.subtitle – Custom subtitle that appears at the top of the HTML report. Defaults to
None.disable_progress_bar – Whether progress bars should be shown. Defaults to
False.seed_sampling – Provide seed for down-sampling of dataset (according to
max_trips_per_user) so that the sampling is reproducible. Defaults toNone, i.e., no seed.evalu – Parameter only needed for development and evaluation purposes. Defaults to
False.
- property analysis_exclusion: list
List of analyses that have been excluded from the report and similarity measures. If analysis selection was provided as a parameter, they are inverted to this
analysis_exclusionparameter.
- property budget_split: dict
Budget split as specified in the parameters.
- property df: DataFrame
DataFrame containing the processed input mobility data of the report.
- property max_trips_per_user: int
Maximum number of trips per user as specified in the parameters. If
Nonewas given, this equals the actual maximum according to the data.
- property privacy_budget: int | float
Privacy budget as specified in the parameters.
- property report: dict
A dictionary with all report elements (i.e., analyses).
- property tessellation: GeoDataFrame
Processed tessellation.
- to_file(output_file: str | Path, disable_progress_bar: bool | None = None, top_n_flows: int = 100) None[source]
Write the report to a file. By default a name is generated.
- Parameters:
output_file – The name or the path of the file to store the
htmloutput.disable_progress_bar – if
False, no progress bar is shown.top_n_flows – Determines how many of the top
norigin-destination flows are displayed. Defaults to 100.
BenchmarkReport
- class dp_mobility_report.benchmark.benchmarkreport.BenchmarkReport(df_base: DataFrame, tessellation: GeoDataFrame | None = None, df_alternative: DataFrame | None = None, privacy_budget_base: int | float | None = None, privacy_budget_alternative: int | float | None = None, user_privacy_base: bool = True, user_privacy_alternative: bool = True, max_trips_per_user_base: int | None = None, max_trips_per_user_alternative: int | None = None, analysis_selection: List[str] | None = None, analysis_exclusion: List[str] | None = None, budget_split_base: dict = {}, budget_split_alternative: dict = {}, timewindows: List[int] | ndarray = [2, 6, 10, 14, 18, 22], max_travel_time: int = 120, bin_range_travel_time: int = 5, max_jump_length: int | float = 10, bin_range_jump_length: int | float = 1, max_radius_of_gyration: int | float = 5, bin_range_radius_of_gyration: int | float = 0.5, max_user_tile_count: int = 10, bin_range_user_tile_count: int = 1, max_user_time_delta: int | float = 48, bin_range_user_time_delta: int | float = 4, top_n_ranking: List[int] = [10, 50, 100], measure_selection: dict | None = None, subtitle: str | None = None, disable_progress_bar: bool = False, seed_sampling: int | None = None, evalu: bool = False)[source]
- Evaluate the similarity of two (differentially private) mobility reports from one or two mobility datasets.
This can be based on two datasets (
df_baseanddf_alternative) or one dataset (df_base) with different privacy settings. The argumentsdf,privacy_budget,user_privacy,max_trips_per_userandbudget_splitcan differ for the two datasets set with the according ending_baseand_alternative. The other arguments are the same for both reports. For the evaluation, similarity measures (namely the symmetric mean absolute percentage error (SMAPE), Jensen-Shannon divergence (JSD), Kullback-Leibler divergence (KLD), the earth mover’s distance (EMD), the Kendall correlation coefficient (KT) and the top n coverage (TOP_N_COV)) are computed to quantify the statistical similarity for each analysis. The evaluation, i.e., benchmark report, will be generated as an HTML file, using the.to_file()method.
- Parameters:
df_base –
DataFramecontaining the baseline mobility data, see argumentdfofDpMobilityReport.tessellation – Geopandas
GeoDataFramecontaining the tessellation for spatial aggregations. Expected columns:tile_id. If tessellation is not provided in the expected default CRS EPSG:4326 it will automatically be transformed. If no tessellation is provided, all analyses based on the tessellation will automatically be removed.df_alternative –
DataFramecontaining the alternative mobility data to be compared against the baseline dataset, see argumentdfofDpMobilityReport. IfNone,df_baseis used for both reports.privacy_budget_base – privacy_budget for the differentially private base report. Defaults to
None, i.e., no privacy guarantee is provided.privacy_budget_alternative – privacy_budget for the differentially private alternative report. Defaults to
None, i.e., no privacy guarantee is provided.user_privacy_base – Whether item-level or user-level privacy is applied for the base report. Defaults to
True(user-level privacy).user_privacy_alternative – Whether item-level or user-level privacy is applied for the alternative report. Defaults to
True(user-level privacy).max_trips_per_user_base – maximum number of trips a user shall contribute to the data. Dataset will be sampled accordingly. Defaults to
None, i.e., all trips included.max_trips_per_user_alternative – maximum number of trips a user shall contribute to the data. Dataset will be sampled accordingly. Defaults to
None, i.e., all trips included.analysis_selection – Select only needed analyses, see argument
analysis_selectionofDpMobilityReport.analysis_exclusion – Ignored, if
analysis_selectionis set! Exclude analyses that are not needed, see argument``analysis_exclusion``ofDpMobilityReport.budget_split_base –
dict``to customize how much privacy budget is assigned to which analysis. See argument ``budget_splitofDpMobilityReport.budget_split_alternative –
dict``to customize how much privacy budget is assigned to which analysis. See argument ``budget_splitofDpMobilityReport.timewindows – List of hours as
intthat define the timewindows for the spatial analysis for single time windows. Defaults to [2, 6, 10, 14, 18, 22].max_travel_time – Upper bound for travel time histogram. Defaults to 120 (mins).
bin_range_travel_time – The range a single histogram bin spans for travel time (e.g., 5 for 5 min bins). Defaults to 5 (min).
max_jump_length – Upper bound for jump length histogram. Defaults to 10 (km).
bin_range_jump_length – The range a single histogram bin spans for jump length (e.g., 1 for 1 km bins). Defaults to 1 (km).
max_radius_of_gyration – Upper bound for radius of gyration histogram. Defaults to 5 (km).
bin_range_radius_of_gyration – The range a single histogram bin spans for the radius of gyration (e.g., 1 for 1 km bins). Defaults to 0.5 (km).
max_user_tile_count – Upper bound for distinct tiles per user histogram. Defaults to 10.
bin_range_user_tile_count – The range a single histogram bin spans for the distinct tiles per user histogram. Defaults to 1.
max_user_time_delta – Upper bound for user time delta histogram. Defaults to 48 (hours).
bin_range_user_time_delta – The range a single histogram bin spans for user time delta (e.g., 1 for 1 hour bins). Defaults to 4 (hours).
top_n_ranking – List of ‘top n’ values that are used to compute the Kendall correlation coefficient and the top n coverage for ranking similarity measures. Values need to be integers > 0. Defaults to
[10, 50, 100].measure_selection – Select similarity measure for each analysis that is used for the
similarity_measuresproperty of theBenchmarkReport. IfNone, the default fromdefault_measure_selection()will be used.subtitle – Custom subtitle that appears at the top of the HTML report. Defaults to
None.disable_progress_bar – Whether progress bars should be shown. Defaults to
False.seed_sampling – Provide seed for down-sampling of dataset (according to
max_trips_per_user) so that the sampling is reproducible. Defaults toNone, i.e., no seed.evalu (bool, optional) – Parameter only needed for development and evaluation purposes. Defaults to
False.
- property emd: dict
The earth mover’s distance between base and alternative of all selected analyses, where applicable.
- property jsd: dict
The Jensen-Shannon divergence between base and alternative of all selected analyses, where applicable.
- property kld: dict
The Kullback-Leibler divergence between base and alternative of all selected analyses, where applicable.
- property measure_selection: dict
The specified selected similarity measure for each analysis.
- property report_alternative: DpMobilityReport
The alternative DpMobilityReport
- property report_base: DpMobilityReport
The base DpMobilityReport
- property similarity_measures: dict
Similarity measures according to
measure_selection.
- property smape: dict
The symmetric (mean absolute) percentage error, based on the relative error, between base and alternative of all selected analyses, where applicable.