Data Science - Life-Domain Analysis

Regret Analysis

Preliminary results: understanding regret patterns across career, immigration, and relationships using Reddit discourse

Project Overview

People express regret after high-stakes life decisions: career changes, immigration moves, relationship choices. This project collects regret-related posts from Reddit, engineers structured features (sentiment scores, reversal indicators, time-to-regret), and uses statistical and machine-learning methods to understand what predicts decision reversal.

Core Question What factors (linguistic sentiment, decision domain, or elapsed time) predict whether someone reporting regret will also report having reversed their decision?

Dataset

Data is collected from Reddit's public JSON API across nine subreddits in three life domains. Posts are filtered for explicit regret language (e.g., "I regret", "I wish I had", "my biggest mistake") and screened to exclude hypothetical or future-oriented regret.

PropertyValue
Raw posts collected~21,800 across nine subreddits
Filtered regret posts3,604 (structured analysis subset)
DomainsCareer (1,998) - Immigration (297) - Relationships (1,309)
Subredditsr/cscareerquestions, r/careerguidance, r/jobs, r/careeradvice, r/USCIS, r/IWantOut, r/immigration, r/relationship_advice, r/relationships
Engineered featuresvader_compound, vader_neg, urgency_score, reversal, time_to_regret_days, topic, emotion, event_type, agency_score, hedging_score, social_embed_score, causal_reasoning_score, future_orient_score, comment_count_log, score_log, sentence embeddings (PCA-10)

The broader collection pipeline scraped ~21.8k raw posts. After regret filtering, deduplication, and structured feature extraction, the preliminary analysis uses 3,604 confirmed regret posts enriched with VADER sentiment and improved temporal extraction (47.2% time coverage, up from 10.7%).

Methods

  1. Collection & Filtering: Reddit API scraping followed by regex-based regret extraction with exclusion of hypothetical language.
  2. Feature Engineering: VADER sentiment analysis on regret sentences; urgency scores from lexical cues; reversal labels from action verbs; time-to-regret from enhanced temporal extraction (regex + dateparser patterns); sentence embeddings (all-MiniLM-L6-v2, PCA-50); 7-class emotion classification (GoEmotions distilRoBERTa); triggering event taxonomy (per-domain keyword-based event type labeling); psycholinguistic features (agency, hedging, social embeddedness, causal reasoning, future orientation); community engagement metrics (log-comments, log-score, upvote ratio).
  3. Causal Inference: Propensity score matching (1:1 nearest-neighbor, career vs immigration); cohort/temporal analysis with logistic regression (year x domain interaction); Kaplan-Meier by era.
  4. Label Validation: Comment-thread scraping (n=700) for reversal confirmation; keyword vs comment-based label agreement (Cohen's kappa); LLM-based annotation pipeline (script ready, requires API key).
  5. Modeling: Cross-domain reversal classification (LR/RF/GBM, stratified 5-fold CV, bootstrap CIs); leave-one-domain-out generalization; NMF topic modeling; Cox PH regression with 13 covariates; nested logistic models for subreddit confounding.
  6. Statistical Testing: Chi-square tests; Mann-Whitney U with effect sizes; Kruskal-Wallis H; log-rank tests; likelihood ratio tests for model comparison; log-odds ratio linguistic analysis (Monroe et al. 2008); bootstrap CIs on all key metrics.
View Code on GitHub

Interactive Data Explorer

Explore key patterns across domains, emotions, and event types. Select a view from the dropdown and hover over data points for details.

Preliminary Results

Result 1

Reversal rates differ significantly across life domains

Question Do regret narratives lead to decision reversal at different rates across career, immigration, and relationships?
Finding Relationships show the highest reversal rate (39.5%), followed closely by career (38.5%). Immigration is notably lower at 27.6%, despite strong regret language. This difference is statistically significant (chi-square = 14.57, p < 0.001).

Key insight

Structural constraints (e.g., immigration systems) limit the ability to act on regret, even when emotional intensity is high. Immigration combines strong regret expression with the lowest reversal rate, a pattern consistent with barriers to undoing a move or visa path.

38.5%
Career reversal
95% CI [36.4%, 40.6%]
27.6%
Immigration reversal
95% CI [22.6%, 32.9%]
39.5%
Relationships reversal
95% CI [37.0%, 42.3%]

Evidence

Reversal Rate by Domain
Reversal Rate by Domain
Regret Posts by Domain
Regret Posts by Domain (n = 3,604)

Model Performance

ModelCV AUC (5-fold)Test AUC95% CI
Logistic Regression0.657 +/- 0.0170.641[0.598, 0.680]
Random Forest0.675 +/- 0.0240.652[0.610, 0.691]
Gradient Boosting0.653 +/- 0.0180.631[0.589, 0.671]

The best-performing model (Random Forest) achieves AUC 0.652 [0.610, 0.691], indicating that text and domain features capture moderate but limited signal for predicting reversal. Top features include has_time, left, vader_compound, and vader_neg.

Precision-Recall Curve
Precision-Recall Curve (Random Forest)
Feature Importances
Top 20 Feature Importances
The feature importance plot reveals that whether a post mentions a specific time frame (has_time) is the strongest single predictor of reversal, followed by action words like "left" and sentiment features (vader_compound, vader_neg). Domain membership also contributes, with immigration as a distinct category.

Next steps

Planned follow-ups
  • Expand immigration corpus (r/greencard, r/h1b) to increase statistical power for immigration sub-analyses
  • Develop a fine-grained immigration sub-classifier: USCIS (bureaucratic) vs IWantOut (voluntary) vs asylum posts likely have different reversal dynamics
  • Test event type as a moderator of the domain-reversal relationship in a formal mediation model
Result 2

Sentiment, not urgency keywords, captures the emotional signal in regret

Question Does the emotional tone of regret language predict whether regret leads to reversal?
Finding Simple urgency keyword counts (mean 0.17) show almost no correlation with reversal (r = 0.05). VADER sentiment analysis finds a statistically significant difference: reversal posts are more negative (mean compound = -0.127) than non-reversal posts (mean = -0.085), Mann-Whitney p = 0.003.

Key insight

Regret language is overwhelmingly reflective, not impulsive. VADER sentiment captures a small but statistically significant emotional difference that simple keyword counting misses entirely.

-0.085
Compound (no reversal)
-0.127
Compound (reversal)
p = 0.003
Mann-Whitney U

Evidence

VADER Compound by Domain
VADER Compound Sentiment by Domain
VADER Negative by Reversal
VADER Negative Score by Reversal and Domain
All three domains show similar overall sentiment distributions (Kruskal-Wallis p = 0.24, not significant). However, within each domain, posts where people acted on their regret (reversal = 1) tend to use slightly more negative language. The effect size is small (r = -0.058) but consistent across domains.
Correlation Heatmap with VADER
Correlation Heatmap (including VADER features)
Calibration Curve
Model Calibration Curve
The updated correlation heatmap shows that VADER features have stronger associations with other variables than the simple urgency keyword count. The calibration curve shows the model is reasonably well-calibrated in the 0.2-0.6 probability range.

Next steps

Planned follow-ups
  • Test whether emotion mediates the domain-reversal relationship (does immigration suppress anger-driven action?)
  • Investigate whether emotional tone differs by event type within each domain (does abuse regret have higher negative affect than infidelity regret?)
  • Test whether sentiment intensity predicts speed of regret rather than reversal itself
Result 3

Regret timing varies dramatically by domain, with relationships appearing fastest

Question How quickly do people report regret after major life decisions, and does this differ by domain?
Finding With improved temporal extraction (47.2% coverage, up from 10.7%), relationship regret surfaces at a median of 60 days, career regret at 210 days, and immigration regret at 365 days. These estimates are based on 1,695 posts with extractable time information.

Key insight

Relationship regret appears within weeks; career and immigration regret takes months to years. This reflects faster feedback loops in interpersonal decisions versus the slow-unfolding consequences of structural life changes.

210 d
Career median
95% CI [180, 365] n=783
365 d
Immigration median
95% CI [365, 548] n=110
60 d
Relationships median
95% CI [60, 90] n=801

Evidence

Distribution of Time to Regret
Distribution of Time to Regret (n = 1,695)
Time to Regret by Domain
Time to Regret by Domain (boxplot)
The improved temporal extraction (using 15+ regex patterns beyond the original 4) reveals a much clearer picture. Relationship regret clusters strongly around 30-90 days, while career regret is broadly distributed across 6 months to 2+ years. Immigration regret centers around the 1-year mark, consistent with visa processing timelines.
Kaplan-Meier by Domain
Kaplan-Meier Survival Curves by Domain
Time Availability by Domain
Time-to-Regret Availability by Domain
The Kaplan-Meier curves show a clear separation: the relationship survival curve drops fastest (regret surfaces earliest), while immigration and career curves decline more gradually. With the improved temporal extraction, 47.2% of posts now have time information (up from 10.7%), substantially increasing confidence in these estimates.

Next steps

Planned follow-ups
  • Apply NLP-based date parsing (SUTime, spaCy) to capture temporal expressions beyond current regex coverage
  • Investigate whether event type (voluntary_departure vs visa_denial) interacts with temporal trajectory in immigration
  • Test whether early vs. late regret have different event-type distributions
Supplementary

Topic modeling reveals domain-specific regret patterns

Method NMF (Non-negative Matrix Factorization) topic model with 10 topics across all 3,604 posts, using TF-IDF features.
Topic-Reversal Heatmap
Reversal Rate by Topic and Domain
The heatmap reveals that reversal rates vary substantially by topic. Job offer-related topics (T6: "job, offer, new, quit, leaving") show the highest reversal rates, consistent with career decisions that are relatively easy to undo. Topics dominated by deep reflection (T2: "regret, decision, degree, years") show moderate reversal, while help-seeking topics (T4: "know, situation, fix") show lower reversal, suggesting people still in the deliberation phase.
Result 4

What triggers regret matters: voluntary departures reverse at 2x the rate of visa denials

Question Does the type of precipitating event (voluntary choice vs structural barrier vs external force) predict whether regret leads to reversal?
Finding Immigration: voluntary departures (n=13, reversal 53.8%) reverse at 2x the rate of visa denial or error cases (n=80, reversal 26.3%). Relationships: abuse/toxicity posts (n=314, 45.5%) and voluntary endings (n=168, 45.8%) have the highest reversal rates, while external-pressure cases (n=87, 32.2%) have the lowest.

Key insight

Reversibility of the precipitating event predicts reversibility of the regret. When someone chose to leave (voluntary departure, n=13, 53.8%), they retain agency and reversal is possible. When a visa was denied (n=80, 26.3%), reversal requires overcoming an external system barrier, which suppresses action even when emotional regret is present. In relationships, abuse-driven regret (45.5%) leads to action as readily as voluntary endings (45.8%), suggesting high emotional clarity in those situations.

53.8%
Voluntary departure
Immigration (n=13)
26.3%
Visa denial/error
Immigration (n=80)
45.5%
Abuse/toxicity
Relationships (n=314)
32.2%
External pressure
Relationships (n=87)

Evidence

Reversal Rate by Event Type
Reversal Rate by Triggering Event Type, per Domain (with domain mean baseline)
In immigration, the contrast between voluntary departure (53.8%) and visa denial/error (26.3%) reveals that when a government system is the barrier to reversal, emotional regret alone cannot overcome it. In relationships, abuse/toxicity actually achieves a reversal rate comparable to voluntary endings, which may reflect the higher clarity and social support available to those experiencing harmful relationships. External-pressure cases (arranged marriages, family-forced decisions) show the lowest relationship reversal rate (32.2%), consistent with social barriers to undoing pressured decisions. Note: Career event types partially overlap with reversal-label keywords (voluntary quit and involuntary exit share vocabulary with the reversal classifier), so career results are interpreted cautiously.
Methodological note Career event type labels (e.g., "quit", "fired") overlap with the keywords used to assign reversal labels in the pipeline. This means career event type and reversal are not fully independent measures. The immigration and relationship findings are not subject to this limitation.

Next steps

Planned follow-ups
  • Expand immigration corpus (r/greencard, r/h1b, r/f1visa) to increase statistical power for immigration event-type tests (current n=122 with known type is insufficient for chi-square)
  • Develop an independent reversal label that does not rely on the same keywords as event type (e.g., comments-based or temporal follow-up labeling)
  • Test whether event type moderates the emotion-reversal relationship (does abuse amplify or dampen the sentiment signal?)
Result 5

Reversal rates diverge over time: career rises while immigration declines

Question Do reversal patterns change across years, and do external shocks (COVID, Great Resignation) affect the relationship between regret and action?
Finding Logistic regression with year x domain interaction shows that reversal rates are significantly increasing over time for career posts (p = 0.003), but the year x immigration interaction is negative and significant (p = 0.009), indicating that immigration reversal rates have been declining relative to career. Era-level analysis confirms: reversal rates differ by era within career (chi-square = 6.1, p = 0.047) and immigration (chi-square = 6.2, p = 0.044).

Key insight

The Great Resignation did not increase career reversal. Counterintuitively, career posts during 2021-2022 show a lower reversal rate (33.6%) than the baseline (39.2%), with Fisher p = 0.10. This indicates that Great Resignation-era career regret was more often about difficulty finding re-employment than about wanting to undo a quit decision.

p = 0.003
Year effect on reversal
Career posts
p = 0.009
Year x Immigration
Interaction term
33.6%
Great Resignation reversal
Career 2021-2022 (n=241)
39.2%
Baseline reversal
Career all other years

Evidence

Reversal Rate by Year
Reversal Rate Over Time by Domain, with COVID and Great Resignation eras highlighted
Kaplan-Meier by Era
Kaplan-Meier survival curves by era (time-to-regret expression)

Next steps

Planned follow-ups
  • Investigate whether specific event types (involuntary exit vs voluntary quit) drive the era-level career effects
  • Analyze whether immigration policy changes (travel bans, visa processing delays) explain the immigration decline
Result 6

Label validation: keyword-based reversal labels show low agreement with comment-based evidence

Question How accurate are the keyword-based reversal labels used throughout this study?
Finding Scraping comment threads for 700 posts and checking for OP replies confirming action, overall agreement between keyword-based and comment-based labels is 57.7% with Cohen's kappa = 0.03. The keyword method labels 42.7% of posts as reversal, while comment evidence supports only 5.6%. This large discrepancy indicates that keyword-based labels capture intention to act and past action description rather than confirmed post-regret behavioral change.

Key insight

Keyword labels measure "action language" rather than verified reversal. Of 299 posts labeled as reversal by keywords, only 21 (7%) had comment-thread evidence confirming action. This does not invalidate the keyword measure, but reframes it: the "reversal" variable in this study is best interpreted as "presence of action-oriented language in regret narratives" rather than "confirmed behavioral change." This is a common operationalization in NLP-based behavioral research, but the magnitude of the gap (kappa = 0.03) is notably low.

57.7%
Agreement rate
n=700 posts
0.03
Cohen's kappa
Near-zero agreement
42.7%
Keyword reversal rate
5.6%
Comment reversal rate

Evidence

Label Agreement Analysis
Keyword vs Comment-Based Reversal Labels: confusion matrix and disagreement categories
The 278 posts labeled as "reversal" by keywords but not confirmed by comments likely include: (1) posts describing past actions that preceded the regret rather than resulted from it; (2) posts using hypothetical action language ("I want to quit"); and (3) posts where OP did not reply in the comment thread. This finding is critical for interpreting all reversal-related results in this study.

Next steps

Planned follow-ups
  • Run LLM-based annotation (GPT-4o-mini) on a stratified 500-post sample for a more nuanced second opinion (script ready, requires API key)
  • Develop a refined reversal label that combines keyword + comment + title-flair signals into a composite score
  • Report sensitivity analysis: do core findings (domain differences, temporal patterns) hold when using the stricter comment-based label?
Together, these six results reveal a coherent picture: domain-level structural constraints, temporal dynamics, the type of precipitating event, and historical era are all more informative predictors of reversal than linguistic or community features. Domain differences are significant (p < 0.001). Reversal rates diverge over time (year x domain interaction p = 0.009). Voluntary departures reverse at 2x the rate of visa denials. Propensity score matching shows the career-immigration gap narrows from 10.3 to 7.1 percentage points after matching on observables (ATT = 0.071, McNemar p = 0.074), suggesting the gap is partially but not fully explained by linguistic and temporal confounders. Critically, label validation reveals that the keyword-based "reversal" variable captures action language rather than confirmed behavior (kappa = 0.03), reframing all findings as reflecting linguistic patterns in regret narratives.

Extended Analysis

Extended 1

Emotion detection reveals sadness dominates regret, but anger predicts reversal

Method GoEmotions 7-class emotion classification (j-hartmann/emotion-english-distilroberta-base) applied to all 3,604 regret posts.
Finding Sadness is the dominant emotion across all domains (58-64%). Emotion distributions differ significantly across domains (chi-square = 85.8, p < 10-13, Cramer's V = 0.109). Crucially, emotion is also associated with reversal (chi-square = 20.4, p = 0.002, Cramer's V = 0.075): sadness has the highest reversal rate (39.9%), while neutral posts have the lowest (26.3%).

Key insight

Emotionally engaged regret (especially sadness and anger) is more likely to lead to action. Posts classified as "neutral" have a reversal rate 13 percentage points lower than sadness-labeled posts, indicating that emotional intensity is a necessary precondition for behavioral change.

62.1%
Sadness (overall)
39.9%
Sadness reversal rate
26.3%
Neutral reversal rate
V = 0.075
Cramer's V (emo x rev)

Evidence

Emotion Distribution by Domain
Emotion Distribution by Domain (stacked proportions)
Reversal Rate by Emotion
Reversal Rate by Detected Emotion
Relationships show the highest anger proportion (19.4%), consistent with interpersonal conflict driving regret. Immigration shows more neutral and surprise posts, reflecting bureaucratic uncertainty. The reversal-by-emotion plot reveals that sadness (the most common emotion) is also the most associated with action, while neutral posts rarely lead to reversal.

Next steps

Planned follow-ups
  • Use emotion as a feature in the reversal classifier to test incremental predictive value
  • Investigate whether emotion mediates the domain-reversal relationship
  • Test emotion trajectories in multi-post users (does anger precede reversal?)
Extended 2

Cross-domain generalization: reversal patterns transfer across life domains

Method Leave-one-domain-out evaluation: train Random Forest on 2 domains, test on the held-out domain. Features include TF-IDF, VADER, sentence embeddings (MiniLM PCA-10).
Finding The model generalizes moderately well across domains. Notably, a model trained on career + relationships achieves the highest AUC (0.697) when tested on immigration, despite immigration being the most structurally different domain.

Key insight

Linguistic markers of reversal are partially domain-invariant. The surprisingly strong performance on held-out immigration (AUC 0.697) suggests that the semantic features of "people who acted on their regret" share common patterns across career, immigration, and relationship contexts.

0.638
Held-out career
95% CI [0.614, 0.663]
0.697
Held-out immigration
95% CI [0.632, 0.764]
0.609
Held-out relationships
95% CI [0.577, 0.641]

Evidence

Cross-Domain Generalization AUC
Leave-One-Domain-Out AUC (Random Forest with TF-IDF + VADER + Embeddings)
Immigration benefits most from transfer learning because its small sample (n=297) gains from the larger training set of career + relationships (n=3,307). Relationships is hardest to predict from other domains, possibly because interpersonal regret uses more domain-specific vocabulary.

Next steps

Planned follow-ups
  • Domain adaptation techniques (e.g., domain-adversarial training) to improve transfer
  • Feature ablation to identify which features drive cross-domain vs domain-specific prediction
Extended 3

Cox regression: post length and urgency are the strongest hazard predictors

Method Cox Proportional Hazards model on 1,695 posts with temporal data. Covariates: domain, VADER sentiment, urgency score, log post length, topic, and emotion class. Log-rank tests for all domain pairs.
Finding The Cox model achieves concordance index 0.625. Log post length (HR = 1.53, p < 0.001) and urgency score (HR = 1.31, p < 0.001) are the strongest predictors. Immigration has a significantly lower hazard of reversal (HR = 0.64, p = 0.008). All pairwise log-rank tests are significant.

Key insight

Longer posts with urgency language predict faster reversal. The hazard ratio of 1.53 for log post length indicates that each doubling of post length increases the reversal hazard by 53%. Immigration's HR of 0.64 quantifies the structural suppression of reversal: immigration posters have 36% lower hazard of acting on regret compared to career posters.

0.625
Concordance index
1.53
HR: log post length
p < 0.001
1.31
HR: urgency score
p < 0.001
0.64
HR: immigration
p = 0.008

Evidence

Cox PH Forest Plot
Cox PH Hazard Ratios with 95% CIs (red = p < 0.05)
The forest plot shows that log post length, urgency score, and domain (immigration) are the only significant predictors at the 0.05 level. VADER sentiment and individual emotions do not reach significance in the multivariate setting, suggesting their effects are captured by the text-length and urgency proxies. The "surprise" emotion approaches significance (HR = 1.42, p = 0.056).
Log-rank testStatisticp-value
Career vs Immigration8.420.004
Career vs Relationships11.990.0005
Immigration vs Relationships20.217 x 10-6
Extended 4

Linguistic markers: action words predict reversal, reflection words predict non-reversal

Method Log-odds ratio analysis (Monroe et al. 2008 style with Dirichlet prior) comparing word usage in reversal vs non-reversal posts, early (<180d) vs late (>365d) regret, and domain-specific vocabulary.
Finding Words most associated with reversal are action verbs: "left" (z=5.73), "quit" (z=3.66), "leaving" (z=3.56). Words associated with non-reversal are help-seeking terms: "advice" (z=-4.03), "change" (z=-3.01), "help" (z=-2.75). Early regret features concrete words ("company", "new", "months") while late regret features abstract terms ("degree", "career", "years").

Key insight

Reversal language is action-oriented; non-reversal language is deliberative. People who reversed their decisions use past-tense action verbs ("left", "quit"), while those who did not reverse are still seeking input ("advice", "help"). This linguistic separation validates our reversal labels and suggests that language alone encodes decision state.

Evidence

Log-Odds Reversal Words
Words Most Associated with Reversal (red) vs Non-Reversal (blue), z-scores from log-odds ratio
The clear semantic separation between action verbs (reversal) and deliberation terms (non-reversal) suggests that automated classification can reliably detect decision state from text. The early-vs-late analysis reveals that early regret is more concrete and situational ("company", "months", "girlfriend"), while late regret is more abstract and structural ("degree", "career", "school").

Modal verb analysis

"Wish" dominates across all domains (~15 per 1,000 tokens), consistent with regret's counterfactual nature. "Should" is most frequent in career (2.2 per 1k) vs relationships (1.2 per 1k), reflecting career's prescriptive advice culture. "Would" peaks in immigration (3.9 per 1k), suggesting hypothetical reasoning about immigration paths not taken.

Extended 5

Sentence embeddings reveal semantic clustering by domain with reversal overlap

Method all-MiniLM-L6-v2 sentence embeddings (384-d) reduced via PCA (50 components, 60.6% variance explained) and UMAP (2-d) for visualization.

Evidence

UMAP Embedding Visualization
UMAP 2-d projection of regret post embeddings, colored by domain (left) and reversal (right)
The domain plot shows clear semantic clusters, indicating that career, immigration, and relationship regret occupy distinct regions of embedding space. The reversal plot shows that reversal posts are distributed throughout all domain clusters rather than forming a separate cluster, consistent with the finding that reversal is predicted by structural features (domain, timing) rather than semantic content alone. PCA embedding features (10 dimensions) are included as predictors in the cross-domain generalization model.
Extended 6

Subreddit-level variation adds signal beyond domain, but core effects are stable

Method Three nested logistic regression models: (1) features only, (2) + domain dummies, (3) + subreddit dummies (9 subreddits). Likelihood ratio tests assess incremental fit. Coefficient stability analysis tracks how feature effects change across specifications.
Finding Domain significantly improves model fit (LR p < 0.001). Subreddits add further significant signal beyond domain (LR p = 0.003), but feature coefficients remain stable across all three models (urgency OR: 1.576 -> 1.588 -> 1.570; has_time OR: 1.814 -> 1.849 -> 1.856).

Key insight

Feature effects are robust to subreddit-level confounding. While individual subreddits (notably USCIS, OR=0.39) show distinct reversal patterns beyond domain grouping, the core predictive features (urgency, has_time, VADER) are stable across model specifications. This indicates the main findings are not artifacts of subreddit-specific norms.

Modelpseudo-R2AICLR test p
Features only0.0264653.4--
+ Domain0.0294642.7< 0.001
+ Subreddit0.0344634.50.003
The USCIS subreddit (OR=0.39, p=0.012) shows particularly depressed reversal rates even beyond the immigration domain effect, reflecting the extreme structural constraints of visa processing. Feature coefficient stability (less than 5% change across specifications) gives confidence that our main findings are not confounded by subreddit culture differences.
Extended 7

Future-oriented language is the strongest psycholinguistic predictor of reversal

Method Five LIWC-style features computed via word-list matching: agency, hedging, social embeddedness, causal reasoning, and future orientation (normalized per 1,000 words).
Finding Future orientation shows the strongest association with reversal (r = -0.097, p < 10-8): posts with more future-oriented language ("will", "plan to", "going to") are significantly less likely to contain reversal indicators. Causal reasoning (r = -0.051, p = 0.002) and social embeddedness (r = -0.042, p = 0.011) are also significant negative predictors.

Key insight

People who have already acted use less future-oriented language. The negative correlation between future orientation and reversal indicates that posts containing action language ("quit", "left") naturally contain less prospective language. Conversely, posts heavy in "plan to", "want to", "hope to" reflect ongoing deliberation rather than completed action. This validates the reversal construct: it captures posts where action has occurred, not posts where action is merely contemplated.

r = -0.097
Future orient. vs reversal
p < 10-8
r = -0.051
Causal reasoning vs reversal
p = 0.002
r = -0.042
Social embed. vs reversal
p = 0.011
Psycholinguistic Features by Domain
Psycholinguistic Features by Domain (per 1,000 words)
Psycholinguistic Features by Reversal
Psycholinguistic Feature Distributions: Reversed vs Not Reversed
Extended 8

Propensity score matching: career-immigration reversal gap narrows but persists

Method 1:1 nearest-neighbor propensity score matching (caliper = 0.10) on 9 covariates: VADER sentiment, urgency, log post length, post year, and 5 psycholinguistic scores. Career (n=1,998) vs Immigration (n=297), yielding 297 matched pairs.
Finding The unmatched reversal gap is 10.3pp (career 38.5% vs immigration 28.3%). After matching, the gap narrows to 7.1pp (career 35.4% vs immigration 28.3%), with borderline significance (McNemar p = 0.074). This indicates that approximately 31% of the raw gap is explained by observable linguistic and temporal differences.

Key insight

Structural barriers explain part of the reversal gap, but a residual gap remains. Even after matching on text features, urgency, and temporal factors, immigration posts still show 7.1 percentage points lower reversal. This residual is consistent with unmeasured structural barriers (visa systems, legal constraints) that suppress action regardless of emotional or linguistic characteristics of the regret post.

10.3pp
Unmatched gap
Career vs Immigration
7.1pp
Matched gap (ATT)
After PSM
p = 0.074
McNemar test
297 matched pairs
31%
Gap explained
By observables
PSM Results
Propensity Score Matching: before/after comparison and PS distributions
Extended 9

Community engagement does not predict reversal: regret-action decisions are internal

Method Reddit post score (log), comment count (log), upvote ratio, and comment sentiment tested as predictors. Random Forest classifier with base features (sentiment + psycholinguistic) vs base + engagement features.
Finding Post score is the only engagement metric significantly correlated with reversal (r = 0.06, p < 0.001), but adding engagement features improves AUC by only +0.005 (from 0.667 to 0.672). High-engagement posts (above-median comments) show virtually identical reversal rates to low-engagement posts (38.2% vs 37.9%, chi-square p = 0.86).

Key insight

Whether a regret post gets community support has negligible effect on whether the person acts on their regret. This is a meaningful null finding: social validation through Reddit (upvotes, supportive comments) does not appear to facilitate behavioral change. Regret-to-action decisions are largely internal or driven by structural factors, not community reinforcement.

+0.005
AUC improvement
Engagement features
0.672
Full model AUC
5-fold CV
p = 0.86
Engagement x Reversal
Chi-square test
Community Features Analysis
Community Engagement Features: model comparison, importances, and engagement-reversal relationship

Regret Analyzer

Paste a text describing a decision you're reflecting on. Our ML models will analyze it for emotional tone, classify the life domain, and estimate the probability of decision reversal based on patterns from 3,604 Reddit regret posts.

Try an example:

Analysis Results

--
Detected Emotion
--
Life Domain
--
Reversal Probability
--
Urgency Level

Limitations

  1. Selection bias: Reddit users are not representative of the general population. Regret posts self-select for people willing to share publicly, likely skewing toward more intense or unresolved experiences.
  2. Reversal labels are lexical proxies: Reversal is inferred from action verbs ("left", "quit", "ended"), not confirmed behavioral change. Comment-thread validation (kappa = 0.03, n=700) confirms that keyword labels capture "action language" rather than verified reversal. All reversal-related findings should be interpreted as reflecting linguistic patterns in regret narratives.
  3. Temporal extraction coverage: Despite improvement from 10.7% to 47.2%, over half of posts lack extractable time-to-regret information. The extracted subset may differ systematically from posts without temporal markers.
  4. Sample imbalance: Immigration (n=297) is substantially smaller than career (n=1,998) and relationships (n=1,309), limiting statistical power for immigration-specific analyses.
  5. Cross-sectional design: Each post is a snapshot. We cannot track whether regret evolved over time or whether reversal occurred after posting.
  6. Model ceiling: Best AUC of 0.652 for reversal prediction indicates that text and metadata features capture only moderate signal. The inherent stochasticity of human decision-making likely imposes a natural ceiling on predictive accuracy.
Full Synthesis

Across 3,604 regret posts spanning career, immigration, and relationships, this analysis identifies a consistent pattern: structural constraints, temporal dynamics, and psycholinguistic features are stronger predictors of action-oriented regret language than emotional intensity or community engagement alone.

Domain matters (p < 0.001): immigration posters show 36% lower hazard of reversal (Cox HR = 0.64). Propensity score matching narrows the career-immigration gap from 10.3pp to 7.1pp (ATT, McNemar p = 0.074), indicating that approximately 31% of the raw domain gap is explained by observable linguistic and temporal differences. The remaining 69% is consistent with unmeasured structural barriers.

Temporal analysis reveals that reversal patterns are not static: they diverge over time (year x domain interaction, p = 0.009), with career reversal increasing while immigration reversal declines. The Great Resignation (2021-2022) counterintuitively showed lower career reversal (33.6% vs 39.2%), suggesting this era's career regret reflected difficulty re-entering employment rather than desire to undo a quit.

Psycholinguistic features provide the richest text-based signal: future orientation (r = -0.097, p < 10-8), causal reasoning (r = -0.051, p = 0.002), and social embeddedness (r = -0.042, p = 0.011) all significantly predict reversal. Community engagement (upvotes, comments) adds negligible predictive value (+0.005 AUC), indicating that regret-to-action decisions are internal rather than socially facilitated.

Critical methodological caveat: Comment-thread validation (n=700, kappa = 0.03) reveals that keyword-based "reversal" labels capture action language rather than confirmed behavioral change. This reframes all findings: what we call "reversal" is best interpreted as "linguistic markers of having acted or intending to act" rather than verified post-regret behavior. This is the single most important direction for future improvement.

For future work, the most promising directions are: (1) developing a validated reversal label via LLM annotation + longitudinal tracking of OP follow-up posts; (2) propensity score matching with additional structural covariates (visa type, employment sector) that cannot currently be extracted from text; (3) expanding the immigration corpus through deeper historical scraping; and (4) testing whether psycholinguistic features (agency, future orientation) causally mediate the domain-reversal relationship via formal mediation analysis.