AzureTracks.com - where to start with data retention - shown is a stack of servers that may store data.
Andrew Posted on 7:10 am

Data Retention in Sentinel – Where to Start

One of the critical aspects of using Microsoft Sentinel is understanding data retention, and how to get started.  Data retention in Microsoft Sentinel involves managing how long data is kept within your Log Analytics workspace.  This is crucial for compliance, incident response, log searchability, and cost management.

In this post, we will some of the basics and some often overlooked topics when creating a Microsoft Sentinel design; or setting up the common ‘production trial’ SIEM demo.

The Case for Use Cases

Building use cases for your data requirements is critical to unlocking the full potential of Microsoft Sentinel, and ensuring that your security strategy aligns with your organizational goals.  By defining clear, specific use cases, you can tailor your data collection and analysis efforts to address the most relevant threats and compliance needs.  This targeted approach not only improves the efficiency and effectiveness of your security operations but also ensures that you are collecting the right data to support detailed investigations and proactive threat hunting.

Additionally, well-defined use cases help in optimizing your resource allocation.  Good use cases enable you to prioritize high-value data sources and focus your efforts on the most impactful security measures.  This clarity helps in reducing noise and false positives, making your security alerts more actionable and reducing alert fatigue among your security team.

Ultimately, developing robust use cases is a proactive step towards a more resilient and responsive security posture, ensuring that you are prepared to tackle evolving threats and maintain regulatory compliance.  I frequently talk with organizations about ingesting only the data that is required….either by a use case to analyze an event or due to a legal requirement.

Data Retention Standards

A large area of discussion is always centered around which data retention standards should be considered for a setup.  The short answer is have your Information Officer, Legal, or Executive approve the data retention for your long haul.  The immediate settings should likely be 90 days for searchable logs, and 90 days to 1 year for the initial long-term setting.  This gives your team 1 year to get the long-term requirements sorted out with business approvals to make sure your organization stays compliant with any relevant requirements.

I have put together this summary table on my most commonly referenced standards and regulation requirements for data retention below.

Key SIEM data retention guidelines and relevance for log management: 

Standard/GuidelineRetention RecommendationsKey Notes
NIST SP 800-92No specific period. Retention should meet legal, regulatory, and business needs.Logs should be kept for as long as necessary for forensic and compliance purposes.
NIST SP 800-53 (Rev. 5)Retain audit records to support audit, accountability, and investigation requirements.Control AU-11 focuses on retaining logs for operational and legal needs.
CISA Best PracticesNo mandated period. Retain logs to support incident response and threat hunting activities.Emphasizes long enough retention to support security investigations.
Industry RegulationsVaries by industry: typically 1 to 7 years (e.g., PCI-DSS, HIPAA, SOX). Regulation Searchable Retention PCI-DSS 90 days 1 year Finance CA (OSFI) N/A 5 years Finance USA (SEC) N/A 3-6 years     (Immutable) Health CA (ePHI) N/A 6 years     == HIPAA BC Prov N/A 3-5 years MB Prov N/A 3-5 years GDPR N/A 3 years start While GDPR does not provide a specific log retention statement, it does reference that logs should be retained a reasonable period to support compliance, response, and investigations.Driven by specific industry compliance requirements.
Microsoft Sentinel Default90 days free, configurable up to 730 days (2 years) with archiving beyond that for longer-term storage.Flexibility to meet operational and regulatory requirements.

Interactive vs. Archive in Log Analytics Workspace

Microsoft Sentinel’s data retention policy is divided into two main categories: Interactive Retention and Archive Retention.

Interactive retention is the period during which data is readily available for querying and analysis.  By default, interactive retention is set to 90 days, but it can be extended based on organizational needs.  The main advantage of interactive retention is the speed and efficiency with which data can be accessed for immediate analysis.

Archive retention is after the interactive retention period, data can be moved to archive retention.  This option is more cost-effective as it lowers storage costs, but the trade-off is slower query performance.  Archive retention can extend the total retention period significantly, allowing data to be stored for up to 2 years or more.  This ensures that historical data is available for compliance and long-term analysis.

Analytics vs. Basic Logs

Microsoft Sentinel allows for different types of logs to be ingested, categorized mainly as Analytics Logs and Basic Logs:

Analytics logs are enriched with additional contextual information and are optimized for complex queries and detailed analysis.  They are ideal for in-depth threat detection and investigation.  Analytics logs incur higher storage costs but provide valuable insights that are critical for a robust security posture, and have a faster search speed.

Basic logs: are a more cost-effective option for storing large volumes of data.  They are suited for simpler queries and are best used for operational monitoring and basic security checks.  While basic logs might not offer the same level of detail as analytics logs, they can still provide essential information for day-to-day operations and initial threat detection.

Implementing Data Retention Strategies

Organizations need to strike a balance between cost and data accessibility when designing their data retention strategies in Microsoft Sentinel.  Here are some best practices:

Evaluate compliance requirements to understand the legal and regulatory requirements specific to your industry.  For instance, PCI-DSS mandates a minimum retention period of one year for audit logs.  See the table I’ve put together further up this post as a quick & handy reference to use.

Assess data usage for how frequently different types of data are accessed and for what purposes.  Set shorter retention periods for less critical data and longer periods for data that is essential for compliance and detailed analysis.  I always feel a bit surprised when a log analytics workspace is configured for a flat set of retention instead of a customized fit to what an organization requires.  This step alone can be a big storage cost change.

Leverage archive retention to store historical data cost-effectively.  This approach allows you to maintain compliance while managing costs.  Again, use a variety of settings to achieve the most optimal retention vs cost balance.

Summary

By effectively managing data retention and utilizing the full spectrum of log types and retention options, you can ensure that your Microsoft Sentinel deployment is both cost-efficient and highly effective in detecting and responding to threats.

Data retention is a crucial element of a robust security strategy within Microsoft Sentinel, and by understanding and leveraging interactive and archive retention options, as well as effectively categorizing analytics and basic logs, organizations can optimize their security operations and ensure compliance with regulatory requirements.  

Implementing thoughtful data retention strategies helps in managing costs and also ensures that critical data is available for timely threat detection and response.  With Microsoft Sentinel’s flexible retention policies and comprehensive data connectors, your organization is well-equipped to navigate the complexities of modern cybersecurity. 

If you have more data retention questions, remember to check your business requirements first and to build use cases to prove-out why the data is needed, and what you intend to analyze within your data.