
Mixpanel Analysis and Viz
analysis.RmdIntroduction
In addition to accessing the Mixpanel API,
emmytics has a number of functions to help standardize
analysis.
Standard Analytic Question
To demonstrate some of the munging utility functions, let’s answer a simple question that a state colleague is interested in - what does the distribution of sync times look like for this pilot?
The first thing we will do is read in the NDJSON file from the pilot that we accessed through the API and stored locally.
path <- list.files("Data/", "json", full.names = TRUE)
df <- read_mixpanel(path)The read_mixpanel is reading in the NDJSON file using
arrow to quickly reading in the data and converting it into
a tibble for ease of use. Along the way, it is also:
- converting the dates to actual date fields
- standardizing the
eventsfield by removing the provider and creating this as its down column (egApplicantFinishedArgyleSyncandApplicantFinishedPinwheelSyncboth becomeApplicantFinishedSync) - caseworker events are filtered out since the focus is on applicants
- page view events (
CbvPageView) are dropped as they don’t have any useful properties - the pilot information is added for ease of use across pilots,
internally running
set_pilot
From here we can subset our dataset down to just the syncing events which will contain the information we are looking for on sync times.
df_syncs <- df |>
filter(event == "ApplicantFinishedSync")Not terribly useful here, but you can run, clean_events
to add spacing and remove the repetitive “Applicant” to all events. This
function comes in handy for plotting data. For this data, you would see
“ApplicantFinishedSync” become a more legible “Finished Sync”.
df_syncs <- df_syncs |>
clean_events()
df_syncs %>%
distinct(event, event_clean)The sync time information stored within the nested
properties column. We can extract it using another
emmytics function, extact_properties.
df_syncs <- df_syncs |>
extract_properties(sync_duration_seconds = properties$sync_duration_seconds)Another alternative to this would have been to do this during the
data import step and then drop the properties column all together
(drop_prop = TRUE).
df_alt <- read_mixpanel(path,
sync_duration_seconds = properties$sync_duration_seconds,
drop_prop = TRUE)Now that we have the data we want, we can report out the summary stats.
df_syncs |>
summarise(
n = n(),
min = min(sync_duration_seconds, na.rm = TRUE),
q25 = quantile(sync_duration_seconds, 0.25, na.rm = TRUE),
median = median(sync_duration_seconds, na.rm = TRUE),
mean = mean(sync_duration_seconds, na.rm = TRUE),
q75 = quantile(sync_duration_seconds, 0.75, na.rm = TRUE),
max = max(sync_duration_seconds, na.rm = TRUE),
.groups = "drop"
)Applicant tracking/journey visuals
Another standard review that might occur is tracking an applicant’s
journey of events. This can be done within the Mixpanel site itself, but
it is often useful to do this review while working on a local analytic
question. This function leverages gt with some Font Awesome
icons to better visualize this process.
You will need to identify a particular applicant to track, in this
case we will use applicant-123456. This function will give
us a tabular display of each of the events (with icons at key events)
and their associated times.
df %>%
follow_applicant("applicant-123456")There is another similar function used for creating more a graphical display. Here you need to run two function to first munge data and then plot the visual.
You will need to have the Font Awesome 7 Free typeface installed to visualize this. It is free to download and can be easily installed on your desktop.
df %>%
munge_journey("applicant-123456") %>%
plot_journey()