Structify Makes Building Custom Datasets Easy with AI

Recognizing that AI agents could act as autonomous data gathers, co-founders built Structify to make custom dataset creation painless.
Structify’s AI agents do the hard work of gathering, validating and refining large scale datasets so that you don’t have to. Web scrapes, pdf backlogs, legacy SOAP APIs, no matter the source, Structify makes it easy to turn unstructured piles of raw data into neatly ordered tables.
We’re thrilled to lead Structify’s seed and to partner with Ronak Gandhi, Alex Goldstein and Alex Reichenbach. Here’s our interview with them on the origins of the company. Transcript edited for clarity and entertainment.
===
The Lights go on for Alex & Ronak
Slater: Let’s start at the beginning. Alex (Reichenbach), Ronak: How did you guys meet? What was the first thing you worked on together?
Ronak: We met on our first day of college — right outside of our dorm hall.
Alex R: Yeah. We were two floors away from each other in the same hallway.
Slater: Did you guys take any classes together? Any memories from that?
Alex R: No. I mean, like, part of what made our friend group so nice was that we didn't overlap academically that much. But, even within the group, you [Ronak] also were just doing crazy things. So you took organic chemistry. So when you graduated, you graduated as a humanities major. But you took organic chemistry then you took quantum. So [Ronak] was just all over the place. And I was mainly interested in the computer science classes.
Ronak: There were a lot of random projects that we would do. Wasn't it that summer that you [Alex R] created the tweet bot?
Alex R: I was thinking about the lights. The lights were fun.
Slater: What’s that?
Alex R: [Laughs] We would throw occasional parties.And in our area, we had a double decker ceiling, on the twelfth Floor of a 14-story tower. We had the entire floor to ourselves. Normally, as a senior, you go live individually, but we were like “Eh, we like each other.” A lot of us were in doubles, but we had this beautiful central area. And we bought 80 feet of LED strip lights, which I then hooked up to every single power outlet. It was set up in like the most janky way possible. Ronak got up on a double story ladder that we had stolen from some facility somewhere.
And I did a live fast Fourier transformation that did beat analysis so the entire room would react with the beat of the music.
Ronak: That was fun, yeah.
Slater: How’d it use FFT?
Alex: Oh, so it does a fast Fourier transformation of the spectrum of the music, so it does a spectrum analysis. Generally, the highest energy portions of a song are the hi-hats, but that's not what we emotionally react to. We react to the beat, which is low energy. So if you just do a power spectrum analysis of the song and have it react to that, it's gonna be different from what you'd expect. So we would have a decomposition that would do a rolling mean based off of three different bands that it would automatically find. And the lowest one would control the brightness of everything.
And then you would have little traveling, small strings of lights throughout. It would be hi-hats or whatever else. So it looked much more reactive than It's like more people would expect. It was fun. And it was also, I mean, it was massive. It was two stories.
The Structify NYC Pizza Dataset; Alex G Gets Conscripted
Slater: So you guys graduated, and then you went off and did different things for a little bit. And then you decided to come back together for Structify. So tell me about that moment. What was that like? What were the conversations that led up to it?
Ronak: We were at Joe's on Fourteenth. So, Alex was visiting. Alex loves pizza.
Alex R: I hadn’t visited New York in a while. I insisted that we go to every single pizza place that I wanted.
Ronak: We tried, like, 10 pizza places in one day. At the end of it, I had the cheese sweats. But somewhere between slice one and slice 10, we had gotten through a conversation of like, you know, basically I was the data guy for the Internet. I was doing all of these like spreadsheets of – trying to get all this bespoke stuff. Things like collecting all the museums on the East Coast, or making a list of all the influencers that we need to target. If I want to find all the different companies and job postings, can I put that in a data set? Etc. It was half scraping, half manual entry – having interns trying to find a vendor to do it.
It was an absolute pain gathering it, let alone getting into the right quality. So I was ranting to Alex about this, and he was running into the same thing. And so it was that thread combined with the fact that I think that both of us wanted to work together… There's this problem that we both have a personal connection to and are really, really passionate about — combined with the idea that we want to spend our time working with the people you care about most.
Alex R: Ronak’s almost saying nice things over there.
[Laughs]
Slater: So where is the Structify New York City pizza dataset? What was the verdict? Was there a consensus on the best?
Alex R: I really like L’Industrie. I haven't been to Lucali yet.
Ronak: Haven't been to Lucali. We should go. I still think Sally's is probably best. Sally’s or Modern.
Slater: For anyone that’s listening, this is a sponsorship opportunity.
Ronak: Send us enough pizzas. Yeah. Of course. And we will make so many datasets for you!
Slater: Alex [Goldstein], how about you? What brought you into this?
Alex G: Yeah. So I did not have any data woes that brought me to this. I was leading robotics teams at a couple different start ups. That's how I got to know Alex when he was also working at Matician or Matic now as they call it.
But I was on a job hiatus when Alex texted me and asked me to come help them build because they [Structify] had customers, and the product was at a place where they had demos but needed to make it something that was really robust. So I decided to come. And as I explicitly said, I’ll come to New York for three months, and then 100% I am leaving and going back to San Francisco. I am not staying in New York or joining your start up. [Laughs — well that didn’t happen.] Pretty much everything changed in my mind over the course of those three months.
The Data Should be Commoditized
Slater: Amazing. Okay. Well, maybe shifting gears, you know, what do you sort of beyond sort of technical insights, which we've covered, a bit, already, how do you guys think about your product philosophy?
Alex R: I would say there's a lot of excitement right now around the navigation & the structuring portion of data gathering. But I get specifically excited around the combination of multiple sources. Our merging capability is something that I think is gonna be long term big for us and somewhat non-consensus.
I think that the perfect agential navigation extraction system right now is going to be commoditized. But, you start combining multiple data sources, you start having those integrations, you start getting a real amount of stickiness on that. And that's only possible if you're combining the multiple sources and if you are able to have a good merging and deduplication system. And people just don't think about that when they think about this space. Yeah. So, it's not something that we push or talk about a lot. But it is fundamentally required to make a business like this work.
Ronak: I think there's a larger story two steps up from there. There's maybe two other tenets, the first of which is: the data should be commoditized.
I think what we've realized, especially in talking with a lot of our users, is that people wanna be able to iterate on their data. People currently treat it as very, very precious [because it’s hard to get, because it’s expensive, because they don’t control the production, …] And because they treat it as very, very precious, they have to — painstakingly — be super, super precise and careful with it. But if you can make data this ephemeral thing that can live and die in your Jupyter notebook, then all of a sudden the way in which you interact with shipping a data pipeline goes from being this headache that we talked about to being something that's fast, iterative, and easy. And I think that’s the direction that the world is heading.
The second tenet is that the person who cares about the data should be the person who creates it. Those things are kind of separated right now, right? If we talk about construction and real estate, they have a really, really smart business understanding of the types of data that they're working with. They may not say the term data, but they know exactly what they're looking for. They know how to get it. They know all the input pieces, but they don't know how to get the data pipeline set up. And I think that because data science has been separated off into data teams, in a different part of the org. It means that there's this long feedback loop — it becomes this large process that is just inherently not iterative and becomes this bureaucratic thing in organizations. And it doesn't have to be.
Alex R: I think one more angle on this question: We chose a very simple thing which should be able to be done robustly, I think that this area has potential to be one of the first instances of where data scientist or another employee is going to feel that they have been powered by the strength of a hundred people. Deep research might save you a couple hours of research. But creating one of these datasets… compared to the manual labor required, it probably will cumulatively save you years.
Alex G: There are a lot of people hyping the idea of using agents to do things like plane tickets, which I think are the clothes-folding robots of agents where it seems cool, but it doesn't actually help anyone. And then you get to deep research, which is legitimately useful but still narrow. And what Structify is doing will be broad and legitimately useful. It’s like an injection molding machine — something that's very simple, but which can do a very simple thing a lot of times, and because of that, is able to become pretty dang useful.