How Rippling runs IT: Streamlining AWS access

Published

Feb 27, 2024

In this blog post, we're excited to unveil how we've leveraged the power of Rippling and AWS IAM Identity Center. Our primary goal? To simultaneously supercharge our developers' productivity and bolster our security stance. To give you a sense of scale, we currently manage 34 AWS accounts, cater to 50+ teams, and provide more than 1,000 users and employees with direct or indirect access to AWS resources or IAM Identity Center-managed apps on a daily basis.

Join us as we walk you through our transition to AWS IAM Identity Center. You'll get a behind-the-scenes look at the key decisions that made this successful: how we organized multiple AWS accounts and set up Just-In-Time access, break glass access, monitoring, and more.

The problem: Rapid growth and 34 AWS accounts

In its early days, Rippling operated with a handful of AWS accounts and a workforce of less than 50 employees. At that time, we had set up a simple approach: IAM Federation using Rippling as the Identity Provider (IdP) to support Single Sign-On (SSO) into AWS accounts. This process involved adding users to groups, assigning roles, and accessing resources.

Fig 1: AWS Single Sign-On (SSO) with IAM Federation

However, as Rippling expanded over the last few years, our employee count surged from 1,000 to 2,000+ employees. As a result, the number of AWS accounts we managed grew from six to 34. We can attribute this growth in AWS accounts to several factors: our multi-region expansion, the maturation of our environments (including development, testing, and production), and the addition of more products in the Rippling ecosystem. Each of these accounts had to be set up separately for SSO, amplifying our operational complexity. In parallel with this growth, the integration of a multitude of new features into our product suite significantly increased the complexity of our AWS account and permission structures, as illustrated by the following developments:

  • As we expanded, our org chart required multiple groups and sub-teams within those groups. Each team had different responsibilities, requiring distinct IAM permissions. From a security standpoint, we aimed to allow our teams to use only the necessary services and tools to do their jobs effectively. This strategy is straightforward when dealing with one or two accounts. However, it becomes complex when managing many AWS accounts.
  • We also adopted the best practice of using multiple, granular AWS accounts, which contributed to the time-consuming management of IAM permissions across accounts.
  • We had to maintain an in-house tool to provision command line interface (CLI) access to AWS for our users. The role entitlement setup was complicated and not managed within an Infrastructure-as-Code paradigm.

The solution: IAM Identity Center and Just-In-Time access

Given the increasing complexity of managing multiple AWS accounts, we decided to transition to IAM Identity Center. Federating with AWS IAM Identity Center allows us to establish a seamless sign-in process via Rippling to AWS. This approach offers a unified method to manage access to the AWS console, AWS command line interface, and applications powered by AWS IAM Identity Center across all our AWS Organization’s accounts.

Fig 2: AWS Single Sign-On (SSO) using IAM Identity Center

We could have used IAM Identity Center to provide feature parity with our legacy system and stopped there. Instead, we leveraged this transition to reimagine our AWS identity system, contemplating the security implications of persistent access and exploring possibilities like Just-In-Time access to control when and why access is invoked, thus laying the foundation for world-class cloud security.

For the rest of this post, we'll discuss the pivotal choices that transformed this from merely replicating legacy features to establishing a benchmark for top-notch cloud security.

Key #1: Set up an AWS Organization structure and establish a naming convention

Our initial step to create an efficient system was to set up a proper AWS Organization structure. This structure forms the foundation to better manage and scale resources.

Fig 3: AWS Organization structure

Furthermore, to reinforce our organizational framework, we also established a consistent naming convention for AWS accounts, user groups in Rippling's IAM Identity Center, and Permission Sets.

Fig 4: AWS Account Naming Convention

The combination of well-structured AWS Organizations and an appropriate naming convention simplifies and automates workflows at scale.

Key #2: Federate IAM Identity Center with Rippling

At Rippling, we believe in the value of "dogfooding"—using our own products to demonstrate their effectiveness and to better understand our users' experience. So, we used Rippling's Identity and Access Management (IDM) product as the source of truth for identity in IAM Identity Center.

Rippling's Apps team built an integration with IAM Identity Center, which supports SCIM protocol. Then, we installed the app in our Rippling account just like our customers would. Installing an app from the Rippling App Shop allows configuring a set of smart rules that define who should get access to the service.

This leads to several advantages:

  • Since Rippling is the source of truth for HR data, we automatically provision and de-provision roles, assignments, and trust configurations across multiple AWS accounts—ensuring new hires have the right access on day one.
  • The smart rules ensure each employee has the correct level of access, using fine-grained rules that can reference any of the attributes in Rippling's employee graph. For example, you can define access based on an employee's team, level, tenure, or even whether they've successfully passed your security training course.
  • Federating with AWS IAM Identity Center provides a seamless Single Sign-On process from Rippling to AWS.

Key #3: Use Terraform modules for permission sets and account assignments

To further streamline the process, we employed Terraform modules to automate the creation of permission sets and account assignments. This not only reduced manual intervention but also minimized the risk of potential errors and supported an auditable change management process through the approval of pull requests.

Here are some examples of this in practice:

  • Users from the group Rippling-Security-Infra should be assigned the permission set SecurityAudit in all accounts in the Rippling AWS Organization.
  • Users from the group Rippling-DataEngineer should be assigned the permission set DataEngineering in the prod-datalake-dataeng-us1 account.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 #----------------------------------------------------------------------------------------------------------------------- # CREATE PERMISSION SETS #----------------------------------------------------------------------------------------------------------------------- module "security_infra_permission_sets" { source = "../../../modules/aws/aws-sso-permission-set" permission_sets = [ { name = "SecurityAudit", description = "Provides Security-Audit access to AWS services and resources.", relay_state = "", session_duration = "PT2H", tags = {}, inline_policy_attachment = "", managed_policy_attachments = ["arn:aws:iam::aws:policy/SecurityAudit"] customer_managed_policy_attachments = [] }, ] } #----------------------------------------------------------------------------------------------------------------------- # CREATE ACCOUNT ASSIGNMENTS #----------------------------------------------------------------------------------------------------------------------- # Security-Audit access for Rippling-Security-Infra group to all active accounts in Rippling AWS Organization module "security_infra_sso_account_assignments_security_audit_access" { source = "../../../modules/aws/aws-sso-account-assignments" account_list = local.all_active_accounts permission_set_arn = module.security_infra_permission_sets.permission_sets["SecurityAudit"].arn principal_type = "GROUP" principal_name = "Rippling-Security-Infra" }

Key #4: Set up Just-In-Time access infrastructure

In the legacy setup, certain groups and individuals were granted persistent access to critical AWS resources. While this approach streamlined operational processes, it also exposed us to greater risk if user credentials were compromised—so we embarked on a journey to a Just-In-Time access model.

Picture this: Users receive temporary access to resources if and when they need them to perform specific tasks. Once they complete the task, access is promptly revoked. This simple yet powerful concept mitigates many potential security risks associated with continuous access.

To put this concept into action, we integrated the open-source tool common-fate to build Just-In-Time access infrastructure. At the heart of this infrastructure is a meticulously crafted access matrix, carefully tailored to assign specific access levels based on task requirements and security considerations.

Here's how the access matrix works:

Access Level

Permissions

Access Type

Approver

Level 1

Includes access required for routine tasks such as viewing general project details and accessing resources necessary for standard work functions

Persistent access

No approval required (permissions auto-assigned based on role)

Level 2

Incorporates complete read and write access to non-sensitive data, including the ability to modify non-critical configurations

Just-in-Time access

Peers or team members

Level 3

Emergency administrator access

Just-in-Time access

Infrastructure managers and team leads

For high-level tasks that entail greater risks, we provide temporary elevated access. We grant Just-In-Time (JIT) access for a short duration, thereby minimizing the risk of unauthorized access or exploitation of critical resources. On the other hand, for lower-level tasks with lower risk profiles, we retained the conventional approach of persistent access, ensuring smooth and uninterrupted operations.

Fig. 5: A logical flowchart for temporary elevated access

The integration of the Just-In-Time access infrastructure has revolutionized our security landscape. It has enabled us to create a more robust environment with minimized attack surfaces, a lower risk of breaches, and a reduced likelihood of human errors. All of this has been achieved while maintaining operational efficiency, marking a significant step forward in our ongoing commitment to secure and efficient management of cloud resources.

Key #5: Set up break glass access

In order to ensure business continuity in disaster scenarios, we established a "break glass" access protocol designed for situations where instant, unrestricted access is critical. These emergencies might include, for example:

  • The failure of Rippling IdP
  • An issue involving IAM Identity Center

In line with AWS's recommendations, we’ve created two IAM users in our AWS Organization Management account who possess the “AdministratorAccess” Policy. These IAM users utilize FIDO security keys for multi-factor authentication (MFA). Additionally, we’ve established a "break glass" role with access to all accounts within the organization. This role can only be assumed by the designated "break glass" users from the management account via trust policies. Any break glass access not only triggers alerts but also pages our dart team—which provides a good segue into the next section: monitoring.

Key #6: Logging, monitoring, and detections

The logging, monitoring, and detection system is an indispensable component of our architecture that tracks user activity and delivers a real-time snapshot of system operations. By identifying anomalous patterns, it equips us with the means to promptly address potential security breaches and system malfunctions.

Our first step toward constructing effective detections for anomalous behavior was to delineate 'normal' behavior. This was based on a user's role, access level, and common patterns of activity. With this standard baseline established, we created detection protocols in our proprietary SOAR (Security Orchestration, Automation, and Response) platform, aptly named 'Cheetah.’ We'll talk more about Cheetah in a future blog post—stay tuned!

Automated alerts have been configured to trigger in the event of any significant deviation, such as an attempt to access break glass users or delete the break glass role from any accounts. Notifications are sent immediately through channels like Slack and Opsgenie, ensuring our incident response team is immediately informed of potential issues.

Key #7: Implement proactive controls

Security measures are not just about reactive strategies; they also demand a proactive approach. With this in mind, we've gone a step further to implement preventive controls that keep potential threats at bay, utilizing the capabilities of AWS service control policies (SCP). 

The following are some examples of SCPs we enabled in our environment:

1. We enacted policies that prevent accounts from leaving the organization.

1 2 3 4 5 6 7 8 9 10 11 12 { "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Action": [ "organizations:LeaveOrganization" ], "Resource": "*" } ] }

2. We restricted access to specific AWS regions to reduce the surface area susceptible to potential attacks.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 { "Version": "2012-10-17", "Statement": [ { "Sid": "DenyUnusedRegions", "Effect": "Deny", "NotAction": [ "a4b:*", "acm:*", "aws-marketplace-management:*", "aws-marketplace:*", "aws-portal:*", "budgets:*", "ce:*", "chime:*", "cloudfront:*", "config:*", "cur:*", "directconnect:*", "ec2:DescribeRegions", "ec2:DescribeTransitGateways", "ec2:DescribeVpnGateways", "fms:*", "globalaccelerator:*", "health:*", "iam:*", "importexport:*", "kms:*", "mobileanalytics:*", "networkmanager:*", "organizations:*", "pricing:*", "route53:*", "route53domains:*", "route53-recovery-cluster:*", "route53-recovery-control-config:*", "route53-recovery-readiness:*", "s3:GetAccountPublic*", "s3:ListAllMyBuckets", "s3:ListMultiRegionAccessPoints", "s3:PutAccountPublic*", "shield:*", "sts:*", "support:*", "trustedadvisor:*", "waf-regional:*", "waf:*", "wafv2:*", "wellarchitected:*" ], "Resource": "*", "Condition": { "StringNotEquals": { "aws:RequestedRegion": [ "eu-central-1", "eu-west-1" "us-west-2", "us-east-1", "ap-southeast-1", "ap-southeast-2" ] }, } } ] }

3. We made it so that users can't disable crucial security tools like Amazon GuardDuty, AWS Config, and AWS CloudTrail.

1 2 3 4 5 6 7 8 9 10 11 12 13 { "Version": "2012-10-17", "Statement": [ { "Action": [ "cloudtrail:StopLogging", "cloudtrail:DeleteTrail" ], "Resource": "*", "Effect": "Deny" } ] }

4. We blocked the creation of IAM users to prevent unchecked privilege escalation and the potential for insider threats.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "Version": "2012-10-17", "Statement": [ { "Action": [ "iam:CreateUser", "iam:CreateAccessKey" ], "Resource": [ "*" ], "Effect": "Deny" } ] }

Overall, these proactive controls act as a frontline defense, hardening our security posture and preempting potential issues before they even arise.

Key #8: Phased team onboarding using a data-driven approach

To ensure a smooth transition across the organization, we used a phased approach of enabling IAM Identity Center access progressively for teams before removing the legacy IAM users for SAML 2.0 integration. We also knew we needed to be data-driven during the rollout, so we set up dashboards showing migration progress by team and AWS account. We ingested data from AWS CloudTrail into Snowflake and then configured Snowflake as the data source in AWS QuickSight. These dashboards allowed us to monitor the migration process, highlight milestones, and detect issues in specific teams or accounts that might require further enhancement.

Unlocking new use cases

The migration to IAM Identity Center and the adoption of a "common-fate" approach for Just-In-Time (JIT) access yielded numerous security and operational advantages. However, three use cases stand out for their impact on Rippling's operations and security posture:

  • Just-In-Time access to Secrets: One of the vital aspects of our move to IAM Identity Center was the establishment of JIT access to AWS Secrets Manager resources. By tagging these resources with the team name, we created a JIT policy, which stipulates that only members of a particular team can request JIT access to secrets owned by their team. This framework allows for secure and controlled modifications or updates to the secrets, thereby reinforcing the principle of least privilege in our operations.
  • EC2 Shell Access via AWS Identity Center and AWS Systems Manager: Our new approach also streamlined EC2 shell access, making it magically fast for users. We leveraged Rippling as the Identity Provider (IdP) and AWS IAM Identity Center for strong, centrally managed authentication, which necessitates WebAuthN for multi-factor authentication. Once authenticated, users assume a dedicated, minimally privileged IAM role and use short-lived AWS access tokens to initiate a shell session to a CDE (EC2 instance) via the AWS Systems Manager Session Manager. This end-to-end process ensures strong user authentication and minimizes security risks.
  • Just-In-Time EKS power user access: In our ongoing quest to balance operational efficiency with strict security controls, we crafted a 'power-user' role specific to our EKS clusters. This role bestows both infrastructure and non-infrastructure team members with elevated access privileges on a Just-In-Time basis across our production and non-production environments, specifically designed for emergency situations. Given the extensive access this role allows within our production system, it’s reserved exclusively for engineers with substantial Kubernetes experience and a legitimate business justification. Through this, we've fashioned a mechanism that simultaneously upholds rigorous control over our environment and facilitates task execution for team members, all while minimizing bureaucratic hurdles.

Results

This project has had a transformative effect on our AWS access strategy, streamlining everything from large-scale identity and access management to the development of a comprehensive detection and logging framework. The benefits we've experienced are quantifiable and impactful, as demonstrated by the following key metrics:

  • Complete centralization: 100% of AWS authentication traffic for humans now routes through IAM Identity Center.
  • Elimination of persistent access: We've achieved a 100% reduction in persistent access to critical data and resources in AWS. This means no permanent access is granted, reducing potential security vulnerabilities and aligning with the principle of least privilege.
  • Widespread JIT adoption: With all 53 product teams now using Just-In-Time (JIT) access to manage their secrets in AWS Secrets Manager, we have not only increased security but also boosted operational efficiency.
  • 3x increase in Infra team productivity for secrets management: This was calculated based on the number of AWS accounts and custom workflows the Infra team used to manage with the old setup.
  • No more IAM users for internal use: We have entirely eradicated the use of IAM users in our AWS accounts for internal use cases, minimizing the risk of privilege escalation and potential insider threats. The only exceptions are a small number of IAM users that remain for third-party integrations where other mechanisms aren’t supported and two IAM users for break glass access in the management account. To further secure these IAM users, we have implemented robust logging, monitoring, and alerting on their usage. They’re confined to having only the limited permissions essential for the integrations. Additionally, we’re actively pursuing the migration of these users to IAM Roles Anywhere, wherever possible, to fortify our security posture. We continue to be diligent in our ongoing efforts to uphold the highest security standards.
last edited: February 27, 2024

Author

Shreyas Damle

Sr. Security Engineer

Shreyas is the Senior Security Engineer for Rippling's Infrastructure Security Team, safeguarding systems by building and implementing robust security controls and tools to protect critical infrastructure.