Customer Segmentation using RFM Analysis (using R)


An eCommerce business wants to target customers that are likely to become inactive. In this article, I will use a grouping technique called customer segmentation, and group customers by their purchase activity.It is an old business adage: about 80 percent of your sales come from 20 percent of your customers. You are in business largely because of the support of a fraction of your customer base: Your best finding the right customers for the business is the main reason behind the success of business. Thus, it becomes necessary to concern about two questions:

  1. How to segment the customers in the database to find out the right customers who have higher potential to response to mails or buy products?
  2. Salespeople should send the mails to what kind of customer so that the business can reach breakeven and make profit?

RFM method is very effective method for customer analysis to solve those kind of question.

What is the RFM Model?

The Recency, Frequency, & Monetary (RFM) Model is a classic analytics and segmentation tool for identifying your best customers. RFM stands for the three dimensions:
Recency – How recently did the customer purchase?
Frequency – How often do they purchase?
Monetary Value – How much do they spend?

So RFM analysis is a marketing technique that can be used to determine quantitatively which customers are the best/right ones by examining how recently a customer has purchased, how often they purchase, and how much the customer spends. RFM method is used for analyzing customers` behaviors and defining market segment.

Benefits of RFM segmentation:

Conducting an RFM analysis on your customer base and sending personalized campaigns to high value targets has massive benefits for your eCommerce store.

  • Personalize, targeted offers
  • (Much) higher response and conversion rates
  • Improve unit economics
  • Increase revenue and profits

What technique are we going to use?

As we know, RFM analysis divides customers into RFM cells by the three dimensions of R, F, and M. The resulting segments can be ordered from most valuable (highest recency, frequency, and monetary value) to least valuable (lowest recency, frequency, and monetary value). Identifying the most valuable RFM segments can capitalize on chance relationships in the data used for this analysis.

Sequential vs Independent RFM Model:

There are two kinds of RFM model, which are sequential and independent. Sequential model creates nested binning, a simple rank is assigned to recency values. Within each recency rank, customers are then assigned a frequency rank, and within each frequency rank, customers are
assigned a monetary rank. In independent method, simple ranks are independently assigned to Recency, Frequency, and Monetary values. Thus, the interpretation of each of the three RFM components is unambiguous.

Kmeans Clustering vs RFM Analysis:

K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed apriori. The main idea is to define k centers, one for each cluster. These centers should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest center. On the other hand,the customers are segmented into similar clusters according to their RFM values.Therefore the characteristics of each cluster determine and retain profitable and loyal customers and then develop the effective marketing strategy for each cluster of customers.

About data:

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts.there are 541910 records and 8 columns are present. And unique customer id is 4338.

Steps for RFM model:

Step 1: Loading the dataset

Step 2: Data cleaning and removing missing values

Step 3: Calculating Recency frequency and monetary for each customer

Step 4: R_score, F_score and M_score and final RFM score

Step 5: Creating the segments based on RFM score using quantile function

Calculating Recency, Frequency and Monetary value

Find the most recent date for each ID and calculate the days to the now or some other date, to get the Recency data
Calculate the quantity of translations of a customer, to get the Frequency data
Sum the amount of money a customer spent and divide it by Frequency, to get the amount per transaction on average, that is the Monetary data.

df_RFM <- df_data %>%
group_by(CustomerID) %>%
frequenci=n_distinct(InvoiceNo), monitery= sum(total_dolar))

Calculating Rscore,Fscore and Mscore using quantile

The below steps explain the process:

  • A recency score is assigned to each customer based on date of most recent purchase. The score is generated by binning the recency values into a number of categories (default is 5). For example, if you use four categories, the customers with the most recent purchase dates receive a recency ranking of 3, and those with purchase dates in the distant past receive a recency ranking of 1.
  • A frequency ranking is assigned in a similar way. Customers with high purchase frequency are assigned a higher score (3) and those with lowest frequency are assigned a score 1.
  • Monetary score is assigned on the basis of the total revenue generated by the customer in the period under consideration for the analysis. Customers with highest revenue/order amount are assigned a higher score while those with lowest revenue are assigned a score of 1.
  • A fourth score, RFM score is generated which is simply the three individual scores concatenated into a single value. RFM


The E-Commerce business should also prioritize segment outreach based on the Recency segments:

  • Hot
  • Warm
  • Cold

The eCommerce business can now create targeted marketing campaigns for customers who are close to falling in segments: from hot to Warm, and so on.If there is a customer who makes 1 expensive purchase per year and they are inactive, they should have a high priority to target for they should target the warm and cold customers by making phone calls and can email to the customer.



Written By:

About Kanij Fatema Aleya:
Kanij Fatema Aleya is M.Sc in Computer Science.  Currently she is working as Analyst Intern with NikhilGuru Consulting Analytics Service LLP (Nikhil Analytics), Bangalore.

1 Comment on "Customer Segmentation using RFM Analysis (using R)"

  1. C. SUBRAMANYAM | April 27, 2022 at 9:55 pm | Reply

    An interesting application without resorting to kmeans clasification. Congrats. keep moving ahead

Leave a comment

Your email address will not be published.



Subscribe for Data Analytics Edge Newsletter & Share..:-)

error: Content is protected !!