Designed and implemented baseline study to measure the usability of the DocuSign iOS mobile application. Moderated 20 in-person lab-based study sessions with prospective DocuSign users.
Insights from this study provided a baseline and helped inform future design direction.
Experimental Research Process Overview
The goal of this study was to establish a usability baseline for the DocuSign iOS Mobile application, focused on the top 2 critical user journeys of the mobile customers ("Sending" and "Sign and Return" experiences). Gain insights into the usability issues from the current experience.
For this study, effectiveness, efficiency, ease of use and satisfaction were measured at task level followed by the SUS (system usability scale) and NPS (Net Promoter Score) at the end of the study.
Recruiting participants for the in-person study sessions during general office hours. Analysis of error types consumed significantly greater time and effort than the original estimate.
Insights from the study were presented to Product, Eng, UX leadership teams, and to the corporate executive team. The study helped inform product design changes and influence future design direction.
How can we improve the current user experience and establish a baseline metric for DocuSign's mobile iOS application, around the two critical user journeys:
Journey 1: Signing and Returning a document
Journey 2: Sending a document for signature
Scoping the research question
Effectiveness: How successful are users in completing the given task?
Efficiency: How efficient are users in completing the task?
Ease of use: How do users perceive DocuSign Mobile App’s usability?
User Satisfaction: How satisfied are they with the experience today?
NPS: How likely are users to recommend this product to someone?
Defining the key metrics
Time on Task
Perceived Ease of Use
System Usability Scale (SUS)Net Promotor Score (NPS)
understanding what and how to measure variables, how to instrument the variables, participant screener, defining scenario based tasks, post task and end of survey questions
Study Design Overview
Statistical Analysis & Findings
We tested the hypothesis by measuring Dependent Variables, Response Time, Response Accuracy, and User Satisfaction (on a Likert Scale) against Independent Categorical Variable - Parking Signs (current textual parking signs and Visual Parking Sign). In addition, the impact of moderating variables - frequency of parking, driver gender and driver age were also studied.
Findings: Response Time
H1: Parking signs that include both visual and textual information increase the efficiency of comprehension in drivers than the current design of using text alone.
Assumption: Response Time is assumed to be a representative measure of the efficiency of comprehension of the sign.
The Response Time histogram below visually illustrates the distribution of control and treatment groups and the table lists the summary statistics
The Means of both groups were nearly the same
The mean and median of the Control Group are relatively equivalent, separated by approximately 3 seconds, indicating a near normal distribution.
For the Treatment Group, the Mean is approximately 7 seconds greater than the Median, indicating that the distribution is slightly skewed to the right.
The range of the Treatment Group is 3 Times that of the Control Group (max RT= 179) This is attributed to the outlier, which skews the distribution positively.
The boxplot below was created to clearly visualize the outlier - which makes the maximum Response Time of the Treatment Group 179 seconds.
Since the distributions were not normal, A Welch’s two sample T-test was run to determine whether there is a statistically significant difference between the means of the Response Time (continuous variable) in the two groups (Nominal Categorical variables).
For this test (see below) we assumed a statistical significance threshold value of p<0.05 In this situation, P=0.9583 >0.05, indicating that the probability of obtaining such a finding by chance is approximately 95 times out of 100. So the result is statistically insignificant. Therefore, we fail to reject null hypothesis.
In an effort to understand the impact of the outlier, we ran the t-test without the outlier in the treatment group.
Removing the outlier had an impact on the results (see table ), it did not change the statistical significance of the result. (p=0.3016>0.05).
The mean response time in the treatment group dropped by almost 3 seconds and the standard deviation dropped by approximately 12 seconds
Conclusion: Since the results of the T-Test were statistically insignificant, it may be concluded that the Response Time does not show statistically significant variation across the Control (Text Signs) and the Treatment (Visual Signs) groups. Further, given the anonymous nature of the study, while we had insufficient data to justify exclusion of the outlier, the impact of the outlier was shown to be statistically insignificant with reference to Driver Response Time. However, the results obtained by its exclusion are clearly informative
We created a survey to administer a design study between subjects. We distracted participants by introducing the study as a test of their comprehension of US traffic and roadway signs, using specifically the 3 signs on the left intertwined with the control or treatment parking sign.
Survey Design & Execution
I built an online survey using Survey Gizmo to run our study which included a "Screener" to verify participant qualifications. Participants recruited through convenience sampling were then presented with an online survey, testing their responses to traffic / roadway signs. Once answers were submitted, participants were debriefed on the study purpose.
Participants were randomly assigned into two groups: the control group and treatment group. Each group saw the three general traffic signs and the parking sign in a random order. For every sign, we had participants answer a verifiable comprehension question which would measure their accuracy of response and also measured their time to respond to determine efficiency of the sign. We also included two SUS questions to understand the user's perception on the "ease of use" of the sign and "confidence" in the accuracy of their response.
We used SurveyGizmo for design and administration of the survey and recruited participants through convenience sampling from different social media channels. 100 participants expressed interest in participation of which only 86 were qualified for the survey.
How can Augmented Reality (AR), Virtual Reality (VR) or 360 Video be leveraged to assist and delight in-market consumers in buying their next car?
Findings: Accuracy of Response
H1: Parking signs that include both visual and textual information increase the accuracy of comprehension in drivers than the current design of using text alone.
Assumption: Accuracy is assumed to be a representative measure of the comprehension of the parking sign.
A 2x2 Chi-Square test to measure and compare the Accuracy of Response (Discrete, Categorical) between the two groups and to determine whether there is a significant relationship between two categorical variables
The table shows that 39/41 in the Control Group got the right answer with an Accuracy rate of 95%
44/45 in the Treatment Group got the right answer with an accuracy rate of of 98%
Since the P-value = 0.9346 is more than the significance level 0.05, , we fail to reject the null hypothesis.
Conclusion: we conclude that there is no significant relationship between the parking signs and user comprehension- measured in terms of accuracy of response.
Findings: System Usability Scale (SUS) Score
We measured the average SUS score to understand the perceived effects of efficiency and accuracy of comprehension of the parking signs.
The average SUS score is assumed to be a representative measure of the satisfaction of use of the sign by users. We computed the mean of the two SUS responses on a 5 point Likert scale and conducted a T-test on that mean score (interval variable) across both groups (nominal categorical variable).
The histogram below visually illustrates the distribution of the Control and the Treatment groups
The value of the mean for the control group is closer to 3, which is equivalent to a “neutral” value on the Likert scale.
Correspondingly, the treatment group had a mean value closer to 4, which is equivalent to “agree” on the Likert scale.
The Min, Max SUS score and the range is the same across both groups
The graphs below show the individual components of the SUS score and their frequency distribution on the Likert Scale.
Majority of the scores fell lie between 4 and 5 for the treatment group.
This implies that the treatment group was more confident about their response to the question asked about the sign comprehension.
The treatment group also found the sign significantly easier to use than the control group.
A Welch’s two sample T-test was run to determine whether there is a statistically significant difference between the means of the SUS score (interval variable) in the two groups (Nominal Categorical variables).
Observation: In this situation, P=0.01616 <0.05, So the result is statistically significant, with a confidence interval of 95%.
Conclusion: Given the statistically significant result, drivers perceive the Treatment Sign (Visual Sign) to be more satisfactory
Findings: Impact of Moderating Variables
We used a 2 way Anova to measure the interaction effects of the Moderating Variable- Frequency of parking (Ordinal), Gender and Age on our Dependent Variables - Response time and Average SUS score.
Observations: We found the effect of moderators: Frequency of parking, Age, Gender, and Platform Used to be statistically insignificant.
Conclusion: Age, Gender, Frequency of Parking and Platform Used were observed to have no significant impact on driver Response Time and Average SUS Score.
The research and findings reported in this study can be enhanced going forward by incorporating adjustments to eliminate limitations. These include recreation of the driving environment that incorporates distractions with time and space. Participant recruitment through random sampling in future studies is highly recommended, if study timeframe is not a constraint. Further, the impact of potential confounders such as color blindness of the driver and legibility of text on the signs needs to be explored.