Responding to Incidents with Microsoft Sentinel – Part 5 – Take Action with Automation
In today’s article we will build on previous automation experiences to further develop your Microsoft Sentinel automation powers! Today we will look at remediating incidents and alerts automatically. We will explore auto-remediation using both playbooks and Sentinel Automation rules. With today’s fast-paced digital landscape, businesses heavily rely on technology to run their operations efficiently. With technological advancements, the complexity of IT infrastructures has also increased, leading to a surge in IT incidents and support tickets.
To tackle this challenge head-on, organizations are turning to automation to streamline ticket remediation processes. In this article, we will explore how businesses can leverage automation to auto-remediate tickets based on the research done using other automation solutions.
If you missed it, the previous article on building your own playbook to look up Virus Total ip address information is right here . If you want to read the article series from the beginning, you can start here.
Understanding Ticket Remediation Automation
Ticket remediation automation refers to the process of utilizing intelligent algorithms and machine learning to diagnose, prioritize, and resolve IT incidents automatically. This approach significantly reduces the manual intervention required by IT teams, enabling them to focus on strategic tasks and providing enhanced customer support.
Simply put, we use Automation Rules in Sentinel to do some quick work for us. We can also do this through playbooks if our built-in logic determines that there is no need for humans to review that alert or incident.
Benefits of Auto-Remediation
1. Increased Efficiency: Manual ticket handling can be time-consuming and error-prone. By adopting auto-remediation, businesses can accelerate the incident resolution process, leading to reduced downtime and improved service levels.
2. Cost Savings: Automating ticket remediation eliminates the need for hiring additional support staff, resulting in cost savings for the organization.
3. Enhanced Customer Experience: Fast ticket resolution leads to higher customer satisfaction. By reducing the mean time to resolution (MTTR), businesses can enhance their reputation and brand value. This also means that as a SOC, we are responding to incidents faster and reducing our ‘blast radius’.
4. Proactive Approach: Auto-remediation can go beyond resolving known issues. It can identify potential problems before they escalate into major incidents, allowing IT teams to take proactive measures.
Researching Other Automation Solutions
Before implementing auto-remediation, it’s crucial to conduct detailed research on existing automation solutions within the organization. By analyzing these solutions, you can identify the repetitive tasks and common issues that can be automated effectively.
Some key areas to investigate include:
1. Incident Handling: Study how incidents are currently managed, the steps involved in their resolution, and the average time taken to close them. Identify patterns in frequently occurring incidents to understand their underlying causes.
2. Log Analysis: Examine the logs generated by various systems and applications. This data can unveil valuable insights into recurring errors or anomalies that can be addressed through automation.
3. Service Desk Operations: Investigate how service desk agents interact with tickets, prioritize incidents, and escalate critical issues. Understanding this workflow will help design an automated remediation system that aligns with existing practices. 4. Event Monitoring: Analyze the event monitoring tools in place to detect system failures or performance bottlenecks. Integrating auto-remediation with these tools can ensure swift resolution of detected issue
Designing the Auto-Remediation System
Once the research is complete, and you have better information on what should be automated; and what can be done quickly, yo can proceed with designing the auto-remediation system.
Much like threat-hunting, we want to define a clear hypothesis so that we can work logically towards a solution. Here are some essential steps to consider in the system design:
1. Define Clear Objectives: Set specific goals for the auto-remediation system. Identify the types of incidents that can be auto-resolved, the criteria for prioritizing them, and the expected reduction in MTTR.
2. Choose the Right Automation Tools: Select automation tools that align with your existing IT infrastructure and can seamlessly integrate with your monitoring and ticketing systems.
3. Create Playbooks: Develop detailed playbooks outlining the step-by-step remediation procedures for different types of incidents. These playbooks should be reviewed regularly and updated as needed.
4. Test Thoroughly: Before deploying the auto-remediation system in a live environment, conduct extensive testing to ensure its accuracy, effectiveness, and resilience.
5. Create Sentinel Automation Rules: Use known low-value alerts in this context. Writing automation rules to add a comment and automatically close those low-value alerts can save many hours of SOC Analyst’s time each week.
Building Automation Rules
Let’s jump into our testing environment in the Azure portal at https://portal.azure.com.
Open your Microsoft Sentinel workspace and head down to the bottom under Configuration > Automation. I’m hoping that you’re getting more comfortable through this little series of articles on handling incidents and alerts in Sentinel….always explore your options and find out what all these features do. In your testing environment of course! 🙂
When you click on Create > Automation Rule, you will see a new blade open up. Let’s start adding some logic to this.
Enter a name such as “Test closing Incidents”. We will delete this rule later-on when we are done.
Under Trigger, choose When Incident is Created.
Next, under Conditions select Incident Provider – Equals All. Then choose Analytics Rule Name – Contains and then click on the drop-down ALL.
Now a big box with all your Analytics rules will display. This can be a tricky one, but for today we will select “AD User Enabled and Password not set Within 48hours”.
This automation rule will ONLY look at alerts generated from this Analytics Rule. This is a key point to understand.* You can select multiple Analytics rules in this field, or ALL. My preference is to create targeted rules with a smaller scope; but you may wish to use a broader scope and please do customize to meet your own needs.
Next, under Actions > choose Change Status, Closed, Benign Positive, and enter a comment.
Ok….let’s talk about the comment for a moment….
I encourage using a consistent statement start here such as “Closed by Automation Rule -” so that it is easy to filter for reporting purposes. If your management team likes to see metrics such as Closed/Open/In Progress and know how many incidents were closed by SOC Team members and closed by automation. This cannot be over-stated on this point.
Next up is Rule Expiration. For our test today we will leave the default to Indefinite, and you can set the order as needed.
Click on Apply to save your rule.
If we open the rule up after saving. Notice that the rule is Enabled. If you want to adjust things, or turn the rule off you can set to Disabled.
Now that the rule is running in your test environment, you can go ahead and force some test incidents, let them naturally happen, and observe your rule in action!
Once you complete your testing, remember to disable or delete your rule by visiting the same Automation Rules blade, selecting your test rule, and then click on Remove in the top navigation bar on that blade.
Conclusion
Incorporating auto-remediation into incident management processes can be a game-changer for businesses. By leveraging the research done using other automation solutions, organizations can proactively resolve IT incidents, reduce downtime, and enhance customer satisfaction. With a well-designed auto-remediation system in place, IT teams can focus on strategic tasks, thereby driving efficiency, cost savings, and improved service delivery.
Embracing automation is the key to staying on top of the ever-increasing number of incidents in most SOC’s today. As we are always continuing our journey to automate and improve ourselves, remember to consider where you may start to explore AI options in your daily world. We’ll start to explore some next-generation AI solutions in the future on this blog, so stay tuned as Microsoft Sentinel continues to lead the SIEM & SOAR landscape.