I'm working on a website that functions as an index of trades people in Ireland. It allows users to register a trade profile detailing their services. This trade profile can then be reviewed by other users. I have this functionality pretty much built out. But until recently I had very little test data, which lead me on a journey into the world of database seeding. Which is the generation of test data that mimics real world data. Here's how I got on.

The goal

I wanted to generate test data that was similar to what would be seen in a real world scenario. A bunch of lorem ipsum wouldn't have been good enough. This is where the Faker module came into play.

Faker allows for the generation of realistic data. For example if you want to generate a user's details using Faker, it would look something like the following.

Faker faker = new Faker();

String firstName = faker.name().firstName();     // John
String lastName = faker.name().lastName();       // Wayne
String email = faker.internet().emailAddress();  // rooster.cogburn@gmail.com

Faker generates data for many scenarios, 79 domains as of writing, and facilitated my need for realistic looking text.

But I didn't just want to generate the text, I wanted to store it in the database. This is where things got complicated. The Spring pattern that I usually follow for data loading is to create a DataLoader component class, which extends the CommandLinerRunner class. I then add some logic in the run method to populate my dev database. Usually this means generating one off objects to persist in the database via a big loop. Long story short, there was no particular method to the madness. That was until recently when I learned of Laravel's approach to seeding via factory classes.

Laravel is a PHP framework that has some neat ideas around eloquent code. Which basically means developer experience. The gist of Laravel's approach to seeding was to use factories with overwritable blueprints to generate entity objects. Which is a good idea. The Laravel implementation however seemed a little overcomplicated to me. With many helper methods that could only be implemented due the looseness of PHP.

My goal was to implement the same functionality in java but with a much more limited api. I wanted an abstract base class, BlueprintFactory<T>, which could be extended to allow the creation of objects based off a blueprint. I also wanted to limit the caller api to two methods. These methods being with and create. create will be used to create the object, and with will be used to supply custom setters. The latter being useful when dealing with related entities. With this goal in mind, I set to work.

Usage

Before implementing something like this, I like to start of by thinking about how the consumer code should look. So I start with how I want to use it. Allowing me to have a better idea of how to implement it. My initial thought was to have an abstract BlueprintFactory<T> class that would force the extending class to implement a abstract T blueprint(); method. In the case of a UserFactory class that might look like the following.


@Component
@AllArgsConstructor
public class UserFactory extends BlueprintFactory<AppUser> {
    private final Faker faker;
    private final PasswordEncoder passwordEncoder;

    protected AppUser blueprint() {
        var user = new AppUser();
        user.setEmail(faker.internet().emailAddress());
        user.setPassword(passwordEncoder.encode("password"));
        return user;
    }
}

The above is clean and clearly gives it's intent. It's also nice and concise as the parent class will implement the with and create methods. The blueprint method will be called each time an object is created. This means that each object will have it's data generated by faker.

I'm happy with this approach to creating the factories. So then I look at using the factories in a seeder class.

The seeder will be the class that stores the objects created via the factories in a database. So I write out how I want my UserSeeder to look. In my case the AppUser entity also has a one to one relationship with a UserProfile entity. So I'll need to create and store that along with the AppUser. It's a good idea to give each entity its own factory. So I create the following UserProfileFactory.


@Component
@AllArgsConstructor
public class UserProfileFactory extends BlueprintFactory<UserProfile> {
    private final Faker faker;

    @Override
    protected UserProfile blueprint() {
        UserProfile userProfile = new UserProfile();
        userProfile.setFirstName(faker.name().firstName());
        userProfile.setLastName(faker.name().lastName());
        return userProfile;
    }
}

It's also a good idea not to call one factory from inside another. You may be tempted to create the UserProfile inside the UserFactory blueprint. This is better handled inside the seeder, where the related entity can be passed during object creation via the with method.

With the UserFactory and UserProfileFactory set up. I can now go about writing the seeder. The UserSeeder will create the AppUser and UserProfile entities using the factories we created. It will also handle the relationship assignments. For example, we will want to set the AppUser that owns the UserProfile when it is created, using the with method. The with method will return a new instance of the factory, with the custom setter in place, to allow for composability. The UserSeeder should look as follows.


@Order(1)
@Component
@Profile("dev")
@RequiredArgsConstructor
public class UserSeeder implements CommandLineRunner {
    private final UserFactory userFactory;
    private final AppUserRepository appUserRepository;
    private final UserProfileFactory userProfileFactory;
    private final UserProfileRepository userProfileRepository;

    @Override
    @Transactional
    public void run(String... args) {
        if (appUserRepository.count() > 0) {
            return;
        }

        var savedUsers = appUserRepository.saveAll(userFactory.create(50));

        var userProfiles = new ArrayList<UserProfile>();
        for (var user : savedUsers) {
            userProfiles.add(userProfileFactory.with(() -> user, UserProfile::setUser).create());
        }

        userProfileRepository.saveAll(userProfiles);
    }
}

In this class we create 50 AppUser entities, by passing 50 to the create method. I then assign a UserProfile to each of those. All of which are persisted to the database.

An area of note is the following for loop.

for(var user : savedUsers){
    userProfiles.add(
            userProfileFactory.with(() ->user,UserProfile::setUser).create()
    );
}

Here we loop through the already saved users, creating a UserProfile for each one. The interesting part is how the user is passed to the user profile. The with method adds a custom setter. Meaning that UserProfile::setUser will be called, and the result of () -> user will be passed to it. Allowing that AppUser instance to be set on the newly created UserProfile.

Some other things to note here is that I add an @Order annotation to specify I want this seeder to run first. As well as the @Profile annotation to say I want it only to run in dev. I also firstly check if there are any users in the database before running the seeder. This is to ensure the database is only seeded when empty.

You may want to create a test user in your seeder. YOu can do this by passing a variant to the create method. This will overwrite the specified fields.

AppUser testUser = appUserRepository.save(userFactory.create(Map.of("email", "test@test.com")));

The above creates a test user with the email "test@test.com". The factory reads the keys of the map and overwrites the matching value on the blueprint.

Implementation

The implementation comes down to three main areas.

  • BlueprintFactory<T>: the extensible base
  • with: allowing composable factory instances
  • create: construction of objects

And there is a lot to go through in each. So I will keep it brief.

BlueprintFactory<T>

This is an abstract class that forces the extender to implement a abstract T blueprint(); method. This method will get called each time an object is created. The result of which will then have any custom setters applied, before being passed to an object mapper which will apply any overwrites specified in the create method call.

So for clarity if a custom setter is added. It will be applied before the variant/overwrite is applied via the create method.

with

This method exists to handle the setting of related entities. As I stated earlier it's good practice to not create any related entities inside your factory. The setting of related entities should be handled solely by the with method.

Let's take the example of a User entity that can own many Post entities. In our seeder we may want to create the User entity first, then apply that User to a bunch of Post entities when they are created. The with method can specify the Post::setUser setter always set the supplied User like so:

var myUserPosts = postFactory.with(() -> myUser, Post::setUser).create(50);

The above will create 50 posts belonging to myUser. As the with method returns a new instance of the factory with the custom setter applied it can also be made composable like so:

var myUserFactory = postFactory.with(() -> myUser, Post::setUser);
var myUserPosts = myUserFactory.create(50);

This is achieved by an internal constructor, protected BlueprintFactory(List<Consumer<T>> customSetters), in which a copy of the factory's custom setters is passed to a new instance of said factory. That new instance is then returned. This allows for the construction of composable factories.

Inorder to facilitate applying a single entity and a list of entities method overloads are used, to match the type of the supplied argument.

  • with(Supplier<R>, BiConsumer<T, R>): Injects a single related object
  • with(Function<Integer, List<R>>, int, BiConsumer<T, List<R>>): Injects a list of related objects
  • with(Function<V, R>, V, BiConsumer<T, R>): Injects a value based on a variant

I made a big effort here with these overloads to keep the public api as simple as possible but still allow for flexibility.

create

The create method is self-explanatory. It creates the new object based of the object returned via the blueprint method. It will call blueprint, then it will apply any custom setters to the return value of blueprint, before passing the modified object to an object mapper which will merge any overwrites that have been passed to the create method call.

Similar to the with method there are a number of overloads to allow for different usages of create.

  • create(): Builds a single instance using the blueprint and custom setters
  • create(Map<String, ?> variation): Applies field overrides using a map
  • create(T variation): Merges fields from an existing instance
  • create(int count): Batch creation of identical instances
  • create(VariantList<T>): Batch creation using pre-defined entity variants
  • create(VariantMapList): Batch creation using field maps (variant by map)

I was forced to create some custom types here to avoid ambiguity with the type checker. The was caused by allowing both a List<Map<String, ?>> and a List<T> to be passed to create to allow the creation of multiple variants. This is called a Sequence in the Laravel terminology. To get the type checker to allow this I had to create some wrapper classes, VariantMapList and VariantList<T> to distinguish between the types. This creates some overhead but until I can find a better solution. I will put up with it to keep the concise public api.

The module

BlueprintFactory has been uploaded to maven central. You can read the docs and learn how to use it in your own projects at BrianDouglasIE/BlueprintFactory.

Feel free to create a github issue with any suggested improvements.