在 Neo4J 中使用 Spring Data 创建和保存关系需要花费太多时间

问题描述 投票:0回答:1

我正在尝试将 JSON 文档从 Couchbase 迁移到 Neo4j。收到文档后,我通过读取一些字段来找出要创建的对象的类型。每个节点对象类从 Transaction 类继承一些节点属性,如 ID、标签和版本(用于乐观锁定)。

节点类示例:

@Node
public class User extends Transaction{

    public static final String featureID = "12109";

    public static final String featureVariantID = "000";

    @Property("FullName")
    private String fullName;

    @Property(name = "Alias")
    private String alias;

    @Property(name = "EmploymentType")
    private String employmentType;

    @Property(name = "EmployeeCode")
    private String employeeCode;

    @Relationship("LineManager")
    Set<RelatedTo<User>> lineManagers;

    @Relationship("FunctionalManager")
    Set<RelatedTo<User>> functionalManagers;

    @Relationship("LegalEntity")
    Set<RelatedTo<LegalEntity>> legalEntities;

    // More relations like these

    

    public User() {

    }

    public User(String transactionID, String tenantID) {
        super(featureID, featureVariantID, transactionID, tenantID);
        this.fullName = "";
        this.alias = "";
        this.employmentType = "";
        this.employeeCode = "";
        lineManagers = new LinkedHashSet<>();
        functionalManagers = new LinkedHashSet<>();
        legalEntities = new LinkedHashSet<>();
        // ....
    }
    
    //Getters and Setters

    // Used to get which fields of the JSON document are to be used to populate the relationship sets
    private static final Map<String, String> allowedRelationships = new HashMap<>();
    static {
        allowedRelationships.put("LineManager", "Data.LineManagerUserID");
        allowedRelationships.put("FunctionalManager", "Data.FunctionalManagerUserID");
        allowedRelationships.put("LegalEntity", "Data.EmployeeLegalEntityID");
        
    }

    public Map<String, String> getAllowedRelationships() {
        return User.allowedRelationships;
    }

}

除此之外,还有其他 12 个类具有类似的结构。
关系实体的结构为:

@RelationshipProperties
public class RelatedTo<T extends Transaction> extends BaseRelationship <T>{

    @Property("DocumentID")
    private String documentID;

    public RelatedTo() {

    }

    public RelatedTo(String effectiveFromTimestamp, String effectiveTillTimestamp, String status, T target, String documentID) {
        super(effectiveTillTimestamp, effectiveFromTimestamp, status, target);
        this.documentID = documentID;
    }

    //Getter and Setter
}

继承自父类:

@RelationshipProperties
public class BaseRelationship <Target extends Transaction>{

    @Id
    @GeneratedValue
    private Long id;

    @Property("EffectiveTillTimestamp")
    private String effectiveTillTimestamp;

    @Property("EffectiveFromTimestamp")
    private String effectiveFromTimestamp;

    @Property("Status")
    private String status;

    @TargetNode
    private Target targetNode;

    public BaseRelationship(String effectiveTillTimestamp, String effectiveFromTimestamp, String status, Target targetNode) {
        this.effectiveTillTimestamp = effectiveTillTimestamp;
        this.effectiveFromTimestamp = effectiveFromTimestamp;
        this.status = status;
        this.targetNode = targetNode;
    }

    public BaseRelationship() {
    }

    // Getters and Setters
}

为了确定要创建的对象的类型,我使用了反射,它返回一个

<T extends Transaction> Class<T>

对象。然后我创建节点并将其保存到 Neo4j 中。之后,我再次阅读文档以创建该节点的关系以及一些数据处理,然后保存节点。喜欢:

{
    LegalEntity parentNode = transactionRepository.getOrCreateNode(transactionMap, tenantID, relTransactionID, LegalEntity.class);// fetching node from database
                RelatedTo<LegalEntity> newParentRelationship = new RelatedTo<>(effectiveFromTimestamp, effectiveTillTimestamp, status, parentNode, documentID);
                Set<RelatedTo<LegalEntity>> existingRelationships = user.getLegalEntities();
    // some processing of data and then updating the above set
}

问题是每次保存需要 4-10 秒,包括处理、获取和保存。每个节点可以包含 10-15 个关系。这意味着每个关系的创建时间超过 400 毫秒。

有什么方法可以提高速度,因为我必须从 couchbase 导入超过 50,000 个文档。

目前我正在单个线程中更新每个节点和关系。我尝试使用 ExecutorService 来完成此操作,但保存操作会陷入死锁并抛出乐观锁定异常或瞬态数据访问异常。

spring-boot reflection neo4j spring-data-neo4j
1个回答
0
投票

Spring Data 不太适合 ETL,因为它需要反序列化和重新序列化,这对于大量记录来说不可扩展。

如果从 Couchbase 导出为中间格式(例如 CSV),然后从该格式导入 Neo4j(例如使用 neo4j-admin 工具),速度会快得多。

© www.soinside.com 2019 - 2024. All rights reserved.