Notifications

Clear all

How to create empirical cumulative probability curve from observed data?

Analytica Q&A

Last Post by Lonnie Chrisman 1 day ago

3 Posts

2 Users

0 Reactions

3 Views

Posts: 1

Iván Andrés Arellano

Customer

Topic starter

Jul 21, 2025 11:39 pm

(@ivanarellanonaranjo)

New Member

Joined: 2 days ago

Hello,

I have 20 observed daily production values and I want to create an empirical cumulative probability curve (NOT normal or any theoretical distribution) that I can view by clicking "Cumulative Probability" in the result view.

Production_Data := [415, 410, 409, 410, 425, 404, 416, 398, 422, 385, 410, 408, 412, 408, 426, 421, 423, 421, 418, 408]

I need the cumulative probability curve to be based exactly on these observed values (empirical distribution), not fitted to any theoretical distribution.

I've tried using Discrete() with unique values and their frequencies, but I keep getting "Value is not probabilistic" errors when trying to access the cumulative probability view.

What's the correct way to create an empirical distribution from observed data that supports the "Cumulative Probability" visualization?

Thanks!

2 Replies

Posts: 46

Lonnie Chrisman

Admin

Jul 22, 2025 7:43 am

(@lchrisman)

Member

Joined: 15 years ago

First, you'll need to decide whether you want a continuous or discrete distribution. Given that you don't want it fitted in any way, I'm assuming you want a discrete distribution with these as the only possible discrete values. For that, the easiest approach is just resampling, one (of many) possible ways to implement that is:

ChanceDist( 1, Production_data, Production_data )

I'll mention that the first parameter should (in a strict sense) actually be 1 / IndexLength(Production_data), but when you just want uniform resampling ChangeDist accepts p:1 for convenience.

For a continuous distribution you do need to adopt some distributional form since it will need to decide how to interpolate between these values and extrapolate the tails. Once again there are several choices. The Keelin Metalog distribution is often a good choice, where you can use:

Keelin( Production_data, I:Production_data)

The Keelin distribution takes on the shape of your data -- if it happens to match a classic distribution, it can approximate those very closely, but it covers a much wider space of possible shapes.

As a side note, for a distribution like this, displaying the PDF using Smoothing (rather than histogram) works better, I think, even though histograms are more robust across the full spectrum of all distribution types. To select smoothing, while viewing the PDF press Ctrl+U (uncertainty settings) and check the Smoothing radio button. This is just a graphing setting, it doesn't change the underlying result.

With the Keelin MetaLog, you have a couple hyperparameters that you can adjust -- specifically the number of terms and the bounds. In the above example, Analytica auto-selects the number of terms it thinks is appropriate, but you can have it use a specific number of terms using, e.g.,

Keelin( Production_data, I:Production_data, nTerms: 7)

These are unbounded continuous distributions, where the tails go to -INF and +INF (however, since the tails drop off exponentially fast, in both cases it is essentially zero probability <300 or >500 in this case. If you know from some other knowledge that there is a hard-lower or upper bound, you can include either or both:

Keelin( Production_data, I:Production_data, nTerms:7, lb:350)

Keelin( Production_data, I:Production_data, nTerms:7, lb:350, ub:450)

Your original data consisted of all integers, where as Keelin is a continuous distribution. If you really want integers, I would probably just round:

Round( Keelin( Production_data, I: Production_data, nTerms:7 ) )

Posts: 46

Lonnie Chrisman

Admin

Jul 22, 2025 7:56 am

(@lchrisman)

Member

Joined: 15 years ago

You had also said that you tried using Discrete(...). I thought I should also comment on that.

Discrete(...) is not a distribution function, but is instead a function used for specifying the Domain of a variable. When you set the Domain to "Discrete" using the domain-type pulldown, it sets the domain expression to:

Discrete( )

You can see this by selecting the "Expression" view in the Domain attribute.

Among other things, this can provide a clue to the PDF and CDF calculations as to whether it is computing a discrete or continuous distribution. It is also used by the optimizer for determining the type of decision variable, plus for a few other things.. You could instead set the domain expression to be

Continuous( )

Anyway, Discrete(...) does sound like it might be a distribution function, but isn't.

Forum Jump:

Previous Topic

4 Forums
90 Topics
289 Posts
1 Online
1,909 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

How to create empirical cumulative probability curve from observed data?

Ready to make better decisions?

Download Free Analytica