Inside the 2023 Sumo Logic Security Intrusion and Response
How did Sumo Logic respond after compromised credentials led to an intrusion of an AWS account?
Last fall, Joe Kim was getting ready to go on a trip to Australia and South Korea to meet with Sumo Logic customers. While packing, the Sumo Logic president and CEO received a message from one of the company’s developers. He noticed something odd with the company’s Trufflehog client agent. The issue was flagged with Sumo Logic’s security operations center (SOC), which began to investigate.
Kim and George Gerchow, the Sumo Logic chief security officer at the time of the incident, pull the curtain back to offer an inside look at the security incident and how the company’s teams worked together internally and with customers to respond.
Discovering the Intrusion
The developer who first raised the alarm was deprecating the use of static infrastructure credentials, Sumo Logic shared in a blog. He noticed strange activity from Trufflehog when examining the company’s CloudTrail logs in AWS.
“What Trufflehog does is it looks for secret keys across an environment. In this case, it was looking across all of our customers within AWS. And it started doing it en masse,” Gerchow tells InformationWeek.
Once the company’s SOC determined that the activity was in fact malicious, it was off to the races.
Leaping Into Action
Over his years with the company, Gerchow experienced a number of security incidents, but the majority of them were related to external factors, like the Log4J vulnerability. “So, it was pretty terrifying because we knew that something was wrong there, and we knew that it was potentially us who created [it],” he shares.
But as a security leader, a big part of Gerchow’s job is about responding to incidents like this. He did not feel intense pressure as he and his team were in action. “I just didn't feel it that much because this is what we do,” he says. “It wasn't until it was all said and done, like maybe a month later, when we really started going through the postmortem that I was like, ‘I'm tired. I'm exhausted.’”
Gerchow and his team immediately took steps to lock down Sumo Logic’s environments and rotate all potentially exposed credentials. The team also monitored its logs, watching for any additional malicious activity.
“We always preach to customers the importance of keeping all of your log files, be it structured or unstructured files, in a single place, and make sure that at some point, that single source of truth is going to pay dividends for you,” says Kim.
Sumo Logic was able to make use of a “follow the sun” model, with teams in North America, Poland, and India, to ensure there was a constant focus on the incident. Even with teams around the world, a security incident can demand grueling working hours.
“The key players were doing easily 20-hour days, and they wanted to,” says Gerchow. “This is what we practice for. This is what all the tabletop exercises and everything else are about … these moments.”
By the time the incident response process started, Kim was on a plane, headed to his business meetings in the Asia-Pacific region. But the time difference became an advantage. “I was gone for the week, but I was actually on majority the phone calls as the teams were sharing information,” he says.
As the days ticked by, Sumo Logic team members were able to dig deeper into forensics and remediation. Part of any investigation involves considering the actor behind the intrusion. Security and forensics teams have to go down a couple of different avenues here, according to Gerchow.
“Is the threat actor a former employee or insider threat? You always have to go down that path, no matter how much you trust people,” says Gerchow
In this case, the Sumo Logic team was able to determine an insider was not responsible for the intrusion, but the identity of the external actor remains a mystery.
Communicating Internally and Externally
A security incident response requires participation from multiple stakeholders, both internal and external, and everything moves fast. Without effective communication, incident response efforts can easily flounder.
Sumo Logic divided its internal communications on the incident between two Slack channels: one operational and the other for significant updates on the response.
“You could opt into this Slack channel that is view-only that'll give you can you updates as to what significant progress takes place,” Gerchow details. “Now, in the meantime there's another Slack channel where all the actual work is being done and audited, archived, recorded, which helps lead to that postmortem.”
That division helped keep internal stakeholders apprised of the situation while allowing the work being done in the trenches to get done without distractions. Gerchow also sat on evening calls with executives, giving them a chance to engage and ask questions.
Any organization that has been intruded upon knows that internal communication is only part of the battle. Sumo Logic also needed to inform their customers.
Gerchow shares that the team was working to understand how far the threat actor had gotten and to determine what action items to share before taking that step.
“When you're … living in the fire, you definitely want to get something out there as fast as possible. But you have a lot of stakeholders. You have legal involved. You have finance involved and the rest of the [executive] staff and the board,” he says. “To get to that point … of being able to do external comms took roughly about three days.”
Kim sent out an email to customers and partners, directing them to Sumo Logic’s security response center. It posted its first communication on the incident on Nov. 7, 2023, and followed it with nearly daily updates on the investigation and actions for customers to take. The company urged its customers to rotate Sumo Logic API access keys and, as another precautionary measure, third-party credentials stored with the company as a part of webhook connection configuration.
In addition to frequent communication through the security response center, Gerchow committed to 15-minute phone calls with key customers that wanted more information. That offer led to about 50 of those phone calls in the ensuing business days.
On Nov. 20, the company announced that its investigation was complete; no customer data was impacted.
Using an Intrustion as a Valuable Lesson
Every security incident comes with lessons learned on an enterprise’s strengths and weaknesses. How did it happen? What went well? What could be changed and improved in a world where cyberattacks are not if but when?
At the time of the incident, Sumo Logic was working through some issues with developer rights, according to Gerchow. “We were 85% there … getting everyone on IAM roles and all these other things,” he explains. “And, of course, when you're making that kind of progress, it's always going to be that 15% that you haven't gotten to yet where you're going to get hurt.”
A developer left AWS credentials in cleartext in GitHub, which the threat actor used to gain access to Trufflehog.
“I challenge anyone to tell me that they're 100% secure. It just doesn't exist,” says Gerchow. “It's a never-ending journey … especially as the attack surface changes.”
Sumo Logic team members attended the AWS re:Invent conference shortly after working through the security incident. They had a chance to talk to customers and get feedback on the company’s response.
“Many of our customers had been hit with other security incidents at that point as well, not related to us at all. And, by that point, they had kind of experienced [what] comms looks like for other vendors,” says Kim. “They were super thankful [for] how proactive we were and how open we were in terms of communication with them.”
While that transparency was valued by customers, Gerchow points out that communication requires ongoing work.
“Comms are just hard. They're extremely difficult because there's so many opinions, which there should be,” he says. “There's so much pressure testing as to what are we really going to say, when are we gone say it, how do we say it? Comms can always be improved.”
While intrusion response is a team effort, both Gerchow and Kim highlighted a few key players who stood out in the wake of this incident. Gerchow praised the company’s incident commanders, one from the security side and one from the developer side.
“A lot of times those two parts of the house are siloed and maybe even have some friction in between them, but they partnered up so well,” he says. “They were really just instrumental in protecting the company from anything happening moving forward as well as providing the evidence.”
Kim called out Melissa Beck, the company’s head of global communications, for her role in managing external comms. And he highlighted the company’s chief legal officer, Todd Hanna.
“Whenever we would come to a conclusion on the call, he would poke and prod to make sure that we looked at it from every single angle,” Kim explains.
Time is a valuable, and extremely limited, resource when responding to a security incident. Third parties, like forensics teams, come in to help enterprises conduct a swift investigation and remediation, but those external teams need time to acclimate to an enterprise’s systems and tooling.
“I … recommend the extra spend and knowing who you're going to work with ahead of time and bringing them into those tabletop practice scenarios so then that way the onboarding time and acclimation time are shorter,” says Gerchow.
While time is a precious resource, it can pay off to take a few minutes to slow down to ensure that vital steps are not missed.
“[Take] five to 10 minutes and just write down the steps in terms of what we probably should do if we had an infinite amount of time to get everything completed,” Kim recommends. “That gives you a level of clarity to make sure that you didn't miss anything in terms of the steps that you are running through when you're running through it 10,000 miles an hour.”
Many things need to happen at once while investigating and remediating an incident, which creates an incredibly stressful situation for the people involved. Kim cautions against attempting to dig into retros during the initial response process.
“Let's look at this credential. Does it need to be locked down? What are all the steps that [you] need to take to be safe first before you go back and do retros?” he says. “If you separate those two things, typically that … can help with handling some of the stress.”
Following the incident, Sumo Logic made a number of changes. “It was all really about more defense in depth, really looking at privileged access management, getting better developer best practices in place,” says Gerchow.
Cybersecurity intrusions and attacks can teach enterprise leaders a lot about the integrity and structure of their defenses and incident response, but they also hold some very human lessons.
Even seasoned security professionals can feel the stress and pressure of responding to an intrusion or an attack, but remaining calm is important for them and for their teams. Gerchow is now head of trust at developer data platform MongoDB, but he carries lessons from the various security incidents he has faced during his career.
“If the organization, in particular internally, sees you getting overly emotional, starting to lose it, or showing chinks in the armor, you're going to be in trouble. Same thing externally,” he says.
After coming out on the other side, everyone naturally wants to move forward. But don’t forget how hard these incidents are on the people involved, whether your team or outside of the organization. “Have some empathy because we've all been here and … we're all going to go through this at one point or another,” Gerchow cautions.
About the Author
You May Also Like