A few months ago we announced Segment as an integration. As an integration, Datagran lets you pull web, mobile, server, and cloud app data stored in Segment, into Datagran to be used across all of your data projects.


Today we are pleased to announce Segment as an Action. For companies using Segment as a CDP (Customer Data Platform) this is a really useful feature having in mind that from now on, users can send the output of Machine Learning models created in Datagran into Segment to then funnel that data to different business applications supported by Segment.

Why is it important?

Currently, companies using Segment can only track events within their web, app, or servers and spread that data across business apps for either analysis or to take a specific action. To be able to run predictive analysis multiple apps and cloud servers have to be stitched together, something that can be time-consuming and difficult.  With Datagran companies can simply connect the Segment integration, pull all the data stored in there, run Machine Learning models, and then easily send the output to Segment where automatically users will be updated or created if they don’t exist. 

A use case from one of our clients goes as follows:

  1. Client sends their transactional data to Segment which acts as a CDP.
  2. Segment not only sends the data to business applications but also stores it in Snowflake.
  3. Datagran is connected to Snowflake to bring all the data contained in Segment.
  4. Client does ML models like sales predictions, clustering, among many more and sends the output back to Segment.
  5. Client uses Segment to send data to different business applications to take action.


These are the inputs required of the Segment Action in Datagran:

  1. Write Key: This is an API Key that will allow Datagran to send data via the Segment's REST API. For instructions on how to get the Write Key please follow this link.

  2. Table: It refers to the table that you want to extract data from. Datagran will automatically detect your table once you connect the Segment Action to the specific operator (usually a SQL operator).

  3. Event name: Refers to the event that will hold your properties. For example, an event could be "Sales Predictions".

  4. Properties: Refers to the properties contained in an event. You can have as many properties as you want.

  5. Customer id: Refers to the customer ID in your database. If the user exists in Segment we will update the event and properties. If not, we will create a new customer.

A simplified pipeline example with a Segment Action looks like this:


To use Segment as an Action, you must create an HTTP Source in your Segment Dashboard. This is a sample of how it should look like:



The output that Segment will receive would look something like this:

analytics.track('162514333', 'Sales Prediction', { 'iphone': '0.734', 'apple_watch': '0.812'})


As mentioned previously, this method has a user_id string, an event string, and a properties string.

Datagran sends data into Segment in Batch and will be handled by Segment's library. The following was extracted from Segment's documentation:

Every method called ‘does not’ results in an HTTP request, but it is queued in memory instead. Messages are flushed in batch in the background, which allows for a much faster operation.

By default, Segment will flush:

  • every 100 messages (control with upload_size)
  • if 0.5 seconds has passed since the last flush (control with upload_interval)

There is a maximum of 500KB per batch request and 32KB per call.

What happens if there are just too many messages?

If the module detects that it can’t flush faster than it’s receiving messages, it’ll simply stop accepting messages. This means the Action will never crash because of a backed-up queue. The default max_queue_size is 10000.

Ready to take your Segment data further? Sign up here and follow this guide using your Segment account. Take advantage of your data and build machine learning models in no time with our easy-to-use pipeline tool where you can build and deploy models with RFM, Recommended Product, Clustering, and more.