In this blog post, we’ll take a look at RFM (recency, frequency, monetary value) modeling, which is used to group customers or segments by behavior and how they make purchases, from large datasets. As a result, you’ll receive valuable insights for direct marketing.


What is RFM modeling and how can marketers or data analysts use it?


RFM (Recency, Frequency, Monetary) analysis is a proven marketing model for behavior-based customer segmentation. It groups customers based on their transaction history – how recently they purchased from your company, how often, and how much did those customers buy.


RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and future personalization services.


Valuing customers based on a single parameter is insufficient.


For example, you could assume that people who spend the most on your business are your best customers. Most of us agree and would have the same assumption. But hold on, what if they purchased from your business only once? Or a very long time ago? What if they are no longer using your product? Can they still be considered your best customers? Probably not. Judging customer value on just one aspect provides an inaccurate report of your customer base and their lifetime value.


That’s why the RFM model combines three different customer attributes to rank customers.


If they bought recently, they get more points. If they bought more than one time, they get a higher score. And if they spent a larger amount, they get more points as well. Combine these three scores and you’ll get the RFM score.


Finally, you can segment your customer database into different groups based on this Recency–Frequency–Monetary score.


- Recency: The minimum number of recency days

- Frequency: The number of purchases the customer has made

- Monetary: The total value of all purchases for each customer


RFM analysis is a handy method to find your best customers, understand their behavior, and then run targeted email/marketing campaigns to increase sales, satisfaction, and customer lifetime value.


Build an RFM Model for customer segmentation based on time, understand behavior, and identify churn.


In this tutorial, you will learn how to run an ML model on Datagran, using RFM analysis to evaluate customer behavior of a fast-food restaurant, by comparing last year’s data with the current one. The outcome will provide insights not only about customer behavior, but will also help to identify causes for customer churn. 



First, integrate the data source you will extract information from. Datagran imports data from various data warehouses and sources, including databases like PostgreSQL, Azure Cosmos DB, Snowflake, and more, along with business software including HubSpot, Shopify, Facebook Ads, Salesforce, Segment, Amplitude, and more.


For this tutorial, we will use a Microsoft SQL Server source to pull in data. Click here to learn how to integrate data sources into your Datagran account.





Then, create a new pipeline—what Datagran calls its workflows—to process your data.


Now, drag and drop the Microsoft SQL Server data source into the pipeline canvas.

Tip: Multiple sources can also be added into the pipeline such as transactional data from Stripe and behavioral data from a CRM, or even data from your website or App by installing our Web Pixel or mobile SDK or by using Segment.


A data pipeline dashboard


It’s time to build the ML model. First, prepare the dataset.

You will run three RFM operators-one for 2020, one for 2021, and one combining both years.


For this step, it is necessary to extract the data (table) that will be used from the data source (Microsoft SQL Server). This will set up the model so the RFM operators work with the right information.


You need transactional data to be able to perform an RFM model. Our system will then choose what is needed with the sample query we provide below. The transaction data table should include the following columns:


Tip: Make sure the values inside of your table match the ones you will input in the query. For example, if the column’s name is Unique Customer ID, then you will need to use the same capitalization.



Unique customer-id

Date of those transactions

Invoice total amount



A data pipeline dashboard


The first SQL operator will contain 2020’s user invoice data. To do this, drag and drop a MySQL operator into the canvas. 

A list of ml model operators


Operators are functions embedded in Datagran’s platform that help you interact with your data and process it. 

Hover over the first operator element and press the edit button. The SQL query editor page will pop-up. Name the operator, and copy and paste the query sample located below and replace each variable to apply to your table’s variables, or press the Show Editor button located on the top right-hand corner to choose the columns without having to add code. Replace the respective fields with the ones from your table. 


A data pipeline dashboard


SELECT 
  Customer_id, 
  DATE(Invoice_date) AS Invoice_date, 
  SUM(transaction_value) AS Invoice_total_amountFROM
FROM
  transactional_table
GROUP BY 
  Customer_id, 
  Invoice_date,
  Invoice_id


Run the query and then save it. Once back in the canvas, run the operator by hovering over it and pressing the play icon.


Repeat the same process for 2021 user invoice data and 2021+2020 user invoice data. 

You can use the same query sample we provided above.


Run the RFM model.

Drag and drop three RFM operators into the canvas and connect each one of them to the SQL operators.


A data pipeline dashboard



Hover over the first RFM operator and press the edit button. Use the drop-down menus to choose the table to be used and the columns that will determine the Recency, Frequency, and Monetary score. The corresponding columns will be unique Customer_id, Invoice_date, and Invoice_id. For this example you will leave the Spark configuration as is, but should you need to configure it for more advanced modeling, then use the dropdown arrow to open the settings.


Save your configuration and run the element by hovering over the RFM operator and pressing the play button. Repeat this process with the remaining two RFM operators.


A data pipeline dashboard


Extract email and phone numbers from your data to reach the group of customers with the highest score. 

To make the most out of your model, you want to have your customer’s contact information so you can send personalized communication via email campaigns, Facebook Ads, or even push notifications. One of the options to do this is to extract that information from your data source if it’s available inside of it. Just connect the Microsoft SQL Server data source to two SQL operators and copy and paste the below query. Run it and once back in the pipeline canvas run the SQL element by pressing on the play button. Repeat this process to retrieve phone number information using the second query sample listed below.


A data pipeline dashboard


A data pipeline dashboard


User email:


SELECT
rfm.*,
ordenesusuarios.user_email
FROM
`operator_6037c5cac5cb24e12fe55ccb__rfm_groups` AS rfm
JOIN (
SELECT
  UserId,
  user_email
FROM
  `Backbonedb__adataunick_dbo_view_bkb_ordenesusuarios`) AS ordenesusuarios
ON
rfm.user_id = CAST(ordenesusuarios.UserId AS STRING)


User phone number:


SELECT
rfm.*,
ordenesusuarios.user_email
FROM
`operator_6037d469966affa7d8e55d03__rfm_groups` AS rfm
JOIN (
SELECT
  UserId,
  user_email
FROM
  `Backbonedb__adataunick_dbo_view_bkb_ordenesusuarios`) AS ordenesusuarios
ON
rfm.user_id = CAST(ordenesusuarios.UserId AS STRING)



Send the model’s outcome to a CSV file.

Now, you want your model’s results to be sent automatically to a business application for further analysis or to immediately target those customers with the highest scores. In this example, you will send the outcome to a CSV file, and to do that you just simply need to drag and drop two CSV action elements which are located on the right-hand side menu. Connect each action element to the corresponding SQL operators. Click the edit button on each CSV element and name the table, choose the columns you wish to include, and enter the email the results should be sent. When done, click Save. 


A data pipeline dashboard


A data pipeline dashboard


Run the RFM model

It’s time to run your RFM model. Click the + sign and tap on Run. Now the pipeline will begin pulling the information from your data source, processing it, running the RFM models, and sending the results to the email you have chosen. 


RFM models should be at the top of your list on your marketing strategy in order to connect with your customers on a real meaningful level by sending them the information they truly want to receive. 


Once you set up an RFM model with Datagran, your teams will receive valuable insights from your customers which would otherwise remain stored in a server. 


Make use of these insights as a strategic tool to help your company increase revenue, truly understand your customer’s lifetime value, and reduce Churn. 


Once RFM segmentation is integrated into your customer list and e-commerce site, it is a relatively simple way of delivering more tailored messaging to your customers based on their past behaviors.


This RFM model is a black box model made by Datagran’s data scientists, nonetheless, if you need more advanced or personalized options go ahead and get in touch with our team.