What the Data Actually Looks Like

2026-04-24

This post documents the intermediate data produced after scraping and initial structuring. It shows what is actually being analysed before any creative conclusions are formed.

data-analysis methodology

Context

This post shows what the data looks like at an early stage in the research process.

At this point, the data has already moved beyond raw scraping, but it has not yet been interpreted into a creative strategy.

This stage follows the initial data extraction process: Data Mining with LLMs.

It sits in an intermediate state.

It is no longer unstructured input
It is not yet a conclusion
It is the stage where patterns begin to stabilise

This is the phase where the system starts to define what is possible.

The goal here is not to extract meaning directly, but to expose the structure that meaning will later depend on.

From Raw Data to Structured Signals

The initial input is large-scale scraped data.

On its own, this data is:

inconsistent
redundant
noisy
difficult to interpret

At this stage, it has been processed into grouped structures.

These structures do not represent conclusions.

They represent repeated patterns that persist across the dataset.

That persistence is what makes them useful.

What This Data Represents

The data at this stage behaves differently from both raw input and final output.

It has three defining properties:

1. Compression

Large volumes of scraped data are reduced into clusters.

Each cluster acts as a compressed representation of:

recurring themes
common associations
shared stylistic signals

2. Partial Meaning

The data begins to carry meaning, but that meaning is not yet stable.

relationships are visible
patterns are repeatable
interpretation is still open

This is not analysis in the final sense.

It is pre-interpretation structure.

3. Constraint Formation

This is the critical function of this phase.

The data defines:

what appears frequently
what is suppressed
what co-occurs
what conflicts

These act as constraints on any future interpretation.

They do not tell you what to do.

They define what is likely to hold and what is likely to fail.

Example: Clustered Output

The following is an example of what this stage of the data looks like after initial structuring.

This is not raw data.

It is the result of grouping and compressing repeated patterns across the dataset.

The full dataset is available here:

Clustered Output Dataset (Instagram Image Analysis)

How to Read This

Each cluster represents a stable region within the dataset.

Within each cluster:

Core Identity shows what consistently anchors the structure
Orbitals and Amplifiers show what extends or reinforces it
Symbolic Gravity shows how elements interact and constrain each other
Internal Tensions show where instability exists
Emergence & Leverage show where the structure can evolve

These are not categories imposed after the fact.

They are derived from what repeatedly appears in the data.

What This Stage Does

This stage does not produce a creative strategy.

It does something more fundamental.

It defines:

the available building blocks
the relationships between them
the constraints that shape how they can be used

Without this step, any strategy is arbitrary.

With it, strategy becomes constrained by structure.

Why This Matters

Most approaches attempt to jump directly from raw data to conclusions.

That step skips over the part where structure stabilises.

This intermediate layer is where:

noise becomes pattern
pattern becomes constraint
constraint defines possibility

It is the first point where the system becomes predictable.

Boundary

This data does not explain:

what the final creative direction should be
what meaning should be extracted
what decisions should be made

It only defines the space those decisions must operate within.

Subsequent stages build on this.

But they are dependent on it.

Final Observation

At this stage, the system is no longer asking:

What is in the data?

It is asking:

What structures persist strongly enough to constrain everything that comes next?