How can we help? πŸ‘‹

How to find, organize and import your dataset?

Import your dataset to SEOmatic, your first step towards large-scale content generation.

Introduction

This guide provides a comprehensive overview of how to source, structure, and import datasets for your projects.

Finding Data

1. Web Scraping

Web scraping is a powerful tool for gathering data from the internet. You can use various scraping tools for this purpose:

  • Octoparse or Phantombuster: These tools are designed for scraping data from websites without needing to code.
  • AI Scraping Tools: Tools like WebscrapeAI use AI to scrape data from websites.
  • Freelance Services: Platforms like Fiverr or Upwork allow you to hire freelancers who can scrape data for you. This usually costs between $50-$100.

2. Public and Open Data APIs

There are numerous public and open data APIs available, offering a wealth of data across different sectors:

  • Government APIs: Many governments provide public APIs with vast amounts of data. For example, the French Government's API offers various kinds of public data.
  • Private APIs: Companies often offer APIs with data related to their industry. For example, Thecompaniesapi.com provides business data.
  • Research Datasets: Platforms like Google Datasets or Kaggle offer large collections of research data suitable for diverse projects.

3. Request Data from OpenAI

You can also request specific datasets from OpenAI with the following prompt:

Create a dataset with {{ x }} number of rows related to {{ keyword }}

Note: Be aware of potential limitations when requesting data from OpenAI. There may be constraints on the number of rows available or the possibility of encountering 'hallucinated' (inaccurate or fabricated) data.

4. Datasets library

Our platform offers access to over 200 pre-scraped datasets on Kaggle. These datasets cover a wide range of subjects and industries, providing a rich starting point for various projects. Each dataset is carefully manually selected to ensure quality and relevance.

Organizing Your Dataset

Notion image
  • Header Row: The first row should be a header with column names, serving as variables in content templates.
  • Data Rows: Each row represents a single data entry. Variables in templates will be replaced with actual data during content generation.

Special Considerations for Dynamic Content:

  • Image URLs: If your dataset includes image URLs, ensure they are in correct formats like .jpg, .png, .gif, or .webp. This is crucial for maintaining the quality and compatibility of dynamic images.
  • Dynamic Links: Incorporate a column dedicated to links if your project requires dynamic linking. This allows for seamless integration of external resources.
  • Listicle Content: For datasets intended to create listicles, include content with bullet points. This format aids in structuring articles or sections as clear, concise lists.

Here is how a good dataset looks like:

Notion image

And how data can be used for dynamic content creation:

Notion image

Importing Your Dataset

The final step in preparing your data for use is importing it into our system. To do this, you'll need to export your data as a CSV file. Once your CSV file is ready and properly formatted, you can import it into our system.

Notion image

Advanced Tips and Tricks

  • Experiment with Custom AI prompts to tailor the output to your specific content needs. This can enhance the relevance and uniqueness of the generated content.

Best Practices

  • Regularly validate your datasets to ensure they are free of errors and inconsistencies.

Contact and Support

For further assistance, feel free to reach out to our support team. Contact us at contact@seomatic.ai or via chat.

Did this answer your question?
😞
😐
🀩