Important
The following example is more complex code than the general examples, generating data and interacting with Orchestrator. Using and modifying these examples requires a greater understanding of python functions, handling variables, and additional tools such as Pandas.
Note
The code referenced in this document and all published examples with pyedgeconnect are available from the GitHub repository within the examples folder. Each example script contains logic to authenticate to the Orchestrator as documented in the authentication example.
Clone the repository and download the examples with:
$ git clone https://github.com/aruba/pyedgeconnect.git
Collecting Appliance Timeseries Data for 95th Percentile Analysis
The focus of this example is analyzing system aggregate and per-wan-interface 95th percentile utilization.
There are two main functions: collecting appliance data, and reporting on previously collected data. We can only retrieve minute data from Orchestrator for appliances based on the Statistics Retention for Interface stats (default of 72 hours). We can retrieve data again over time and run the reporting function across all of the collected data to create a more accurate view of 95th percentile utilization over a longer sample period.
This demo is not meant to replace long-term statistics retention or exporting to a proper data storage solution / database. It assists as a lightweight workflow to get a sense of utilization without the overhead of deploying a database and query logic.
When running collection the code retrieves appliance interface timeseries data from Orchestrator. The collection can be run for all appliances, or limited to a specific appliance. It will only collect timeseries data for WAN interfaces on appliances that currently have a WAN label applied.
Deployment information including WAN interface comments (a common field for including circuit ID’s etc.) is collected and stored in a separate metadata file for reference in the reporting function. The collection can optionally be run to only update this deployment information file to capture updated comments without having to go through collecting the timeseries minute data for the interfaces.
When the timeseries data is collected, the data is augmented with additional calculations including percent utilziation of the WAN interface vs. the configured deployment values on the appliance as well as interface comments, appliance license value, and timestamp convereted to local timezone time based on the appliance location.
The timeseries data is then stored in a CSV file per-interface for the collection period for analysis.
When run for reporting, all collected CSV files are brought into Pandas DataFrames and merged per appliance. This allows calculation of total system WAN utilization across all labels for a particular timestamp.
The code is then able to run 95th Percentile reporting on a per-label and aggregate system throughput level. All values are calculated on the percent utilization of deployment/license values as to normalize the values across appliances to easily identify outliers.
The calculated data report is exported to a simple CSV. Additional filtering can be performed on the CSV data or using Excel or similar to filter columns to top 10, certain value thresholds, or create charts.
Before running the code, make sure that the Python packages in addition
to pyedgeconnect are installed as this report leverages multiple
packages for processing the collected data.
These are referenced in the requirements.txt file in the
wan_util_95th example directory
pip install -r requirements.txt
Data Collection
Note
The timeseries data gathered in this code example is the same data
viewed in Orchestrator on the Interface Bandwidth Trends tab:
Monitoring->Bandwidth->Overlays & Interfaces->Interface Trends
The code will collect timeseries data for one or all appliances from the current time going back 72 hours from Orchestrator.
The code will only collect deployment and timeseries data for appliances
that are currently in a reachable state to Orchestrator. When
polling pyedgeconnect.Orchestrator.get_appliances() the appliance
has to have a state with a value of 1 to be considered
reachable.
Storage/Analysis Implications
Contiguous minute data will represent 1,440 rows of data for a 24hr period. When considering the 21 columns stored for each appliance interface, that represents 30,240 datapoints per 24hr period for each wan interface captured. With the default behavior capturing the maximum 72 hours of minute-data, this results in a CSV file usually around ~850-900KB per WAN interface. Interfaces with particularly long comment strings can influence the per-file size as they are repeated per timestamp in the raw data stored.
Extrapolating this to an environment with 100 appliances with 2 wan interfaces each, a collection for the past 72 hours would generate 6,048,000 data points, 200 CSV files, and take up ~170MB of disk.
In a larger environment with 375 appliances with a total of 750 wan interfaces, 72 hours of data stores ~70M data points and takes approximately 700MB of disk.
Data Calculations
Calculated fields in addition to those natively from Orchestrator per-wan-interface include:
tz_time: This is the epoch timestamp from the data offset from the utc time to the appliance local timezone time translated to a datetime format"%m-%d-%Y%H:%M:%S"pct_outbound: The percent utilization (0-100) of the outbound bytes transfered over that minute, converted to bits, averaged to a per-second value, and then divided by the deployment outbound value for that interface converted to like-bits from it’s native value in Kbps.point["pct_outbound"] = ( ((point["tx_bytes"] * 8) / 60) / (point["max_bw_tx"] * 1000) ) * 100
pct_inbound: The percent utilization (0-100) of the inbound bytes transfered over that minute, converted to bits, averaged to a per-second value, and then divided by the deployment outbound value for that interface converted to like-bits from it’s native value in Kbps.point["pct_inbound"] = ( ((point["rx_bytes"] * 8) / 60) / (point["max_bw_rx"] * 1000) ) * 100
Warning
There is no age out of data in existing csv files already collected, and so without cleaning up the collection, a large amount of data can be collected over time. This example is meant to inspire what’s possible, not to handle a long-term reporting workflow where data may be stored into a database, aged out on a retention schedule and other production-quality attributes.
Exported Files
Data collection will create or replace existing file named
appliance_interface_comments.json in the wan_int_tseries_data
sub-directory.
It will also create CSV files for each labeled wan interface of each
appliance collected in the wan_int_tseries_data sub-directory. The
files are named in the format of
<hostname>__<interface>_<label>.csv.
Example: EC-01__wan0_INET1.csv
Data Reporting
Note
The output of this code is not meant to be a “production-ready” report, but provide guidance on ways to retrieve and manipulate EdgeConnect timeseries data for further analysis.
Data Filtering for Analysis
As data is ingested back from the CSV files collected, there are three primary points of filtering to reduce down the data to analyze.
Remove data that are outside of defined operating hours as per the variables
BUSINESS_HOURS_STARTandBUSINESS_HOURS_END. Each of these are represented as a 24hr clock in the formatHH:MM.Remove data that are outside of defined operating weekdays as per the variables
BUSINESS_DAY_STARTandBUSINESS_DAY_END. Each of these are represented as an integer where0represents Monday, incrementing through6representing Sunday. The logic is to include the days, e.g.0-4would include Monday through Friday.Remove duplicate timestamps for appliances with the same label/interface are dropped once all files for a single appliance have been merged.
Default Filtering values
BUSINESS_HOURS_START= 09:00BUSINESS_HOURS_END= 17:00BUSINESS_DAY_START= 0BUSINESS_DAY_END= 4
Large Data Analysis Implications
As noted in the previous section, this collection can create large amounts of data, which in turn can take longer to process.
Testing with different data sets filtering for local 9-5 Mon-Fri estimated runtimes for reporting on data sets:
Appliances |
WAN Interfaces |
Report Time |
|---|---|---|
16 |
38 |
~5sec |
170 |
358 |
~22sec |
380 |
780 |
~44sec |
Certainly large environments will collect significantly more data and in turn will take longer to process analysis on.
Data Calculations
Calculated fields in addition to those natively from Orchestrator per-wan-interface include:
System Agg vs Deployment Out: The 95th percentile of percent utilization (0-100) of the outbound data for a particular appliance compared against it’s total system deployment maximum values.System Agg vs Deployment In: The 95th percentile of percent utilization (0-100) of the inbound data for a particular appliance compared against it’s total system deployment maximum values.System Agg vs License Out: The 95th percentile of percent utilization (0-100) of the outbound data for a particular appliance compared against it’s bandwidth license value.System Agg vs License In: The 95th percentile of percent utilization (0-100) of the inbound data for a particular appliance compared against it’s bandwidth license value.
system_df["system_dep_pct_inbound"] = round(
(
(system_df["rx_bytes"] * 8 / 60)
/ (system_df["system_max_inbound"] * 1000)
) * 100,
2,
)
system_df["system_dep_pct_outbound"] = round(
(
(system_df["tx_bytes"] * 8 / 60)
/ (system_df["system_max_outbound"] * 1000)
) * 100,
2,
)
system_df["system_lic_pct_inbound"] = round(
(
(system_df["rx_bytes"] * 8 / 60)
/ (system_df["license"] * 1000)
) * 100,
2,
)
system_df["system_lic_pct_outbound"] = round(
(
(system_df["tx_bytes"] * 8 / 60)
/ (system_df["license"] * 1000)
) * 100,
2,
)
<label> - Out: The 95th percentile of percent utilization (0-100) of the outbound data for a particular appliance for a particular interface with the corresponding WAN label.<label> - In: The 95th percentile of percent utilization (0-100) of the inbound data for a particular appliance for a particular interface with the corresponding WAN label.
for label in labels:
label_df = df[df.label != label]
label_analytics[f"{label} - inbound"] = round(
label_df.pct_inbound.quantile(0.95), 2
)
label_analytics[f"{label} - outbound"] = round(
label_df.pct_outbound.quantile(0.95), 2
)
Exported Files
Report files will be saved in the in the wan_int_tseries_reports
sub-directory.
The dataframe of 95th percentile calculations of percent utilization
against license, deployment, and interface deployment values. This file
is named <YYYY-MM-DD_HH_MM_SS>_report_dataframe.csv.
Python Script & Orchestrator API calls
Runtime arguments
The python script has multiple runtime arguments defined. A user must
specify -c or -r at a minimum to guide collection or reporting
of data. The other arguments are optional.
All runtime arguments are as follows:
-oor--orchType: String
Desc: Specify the Orchestrator IP or FQDN, this can be used to be included in HTML report header as text without requiring connecting to Orchestrator for just reporting on previously collected data in CSV files.
Example values:
192.0.2.100ororchestrator.<company>.comDefault value:
None
-aor--applianceType: String
Desc: Specify a single appliance by hostname to either collect data for, or filter for on analysis of existing data files.
Default value:
None
-cor--collectType: Boolean
Desc: Run the collection portion of the scripting to collect data for one or all appliances
Default value:
None
-dor--deploymentType: Boolean
Desc: Only collect deployment/interface comment data for updating metadata when reporting is run.
Default value:
None
-ror--reportType: Boolean
Desc: Run the reporting/analysis portion of the scripting to analyze previously collected data files for one or all appliances.
Default value:
None
-llor--loglevelType: String
Desc: Logging level for script, examples values include
INFO,DEBUG,ERROR, etc.Default value:
None
Running the script to collect data for all appliances:
python wan_util_95th.py -c
Running the script to collect data for single appliance:
python wan_util_95th.py -c -a MY-appliance-01
Running the script to report data for all existing files:
python wan_util_95th.py -r
Running the script to update deployment interface comments:
python wan_util_95th.py -c -d
Orchestrator API calls
The three API calls to Orchestrator (outside of authentication) are:
The get_appliances function gets all appliances from Orchestrator to
be able to map hostnames with underlying NePK values, as well as
other metadata and state of appliance with Orchestrator.
The get_appliance_deployment will return the full deployment
configuration of the appliance including interface names, comments,
labels, per-interface as well as total system WAN bandwidth values
among other details.
The get_timeseries_stats_interface_single_appliance will return
timeseries data for interfaces on an appliance. This will return a
maximum of 10,000 datapoints and so offers multiple filters to
limit the scope of the query for traffic type, interface name, start
and end time of the data.