Rendering Patients in SolR (Part-1)

Parisni

2018-12-02

Patient medical records contain a broad variety of data such numeric, coding, time-series or medical reports. It can be translated as a json document with a nested structure.

SolR allows to handle highly nested objects to be queried efficiently. It provides pagination, and also faceting features wich are very powerful to provide a patient centered search engine.

Patients in electronic health records:

Consider the two patients below:

[
    {
        "id": "1",
        "ipp_s": "800001",
        "age_i": 19,
        "projects_ss": [
            "serv1",
            "gh1",
            "hop1"
        ],
        "content_type_s": "patient",
        "_childDocuments_": [
            {
                "id": "2",
                "patient_id": "1",
                "content_type_s": "visite",
                "nda_s": "224872810",
                "text_t": "hello world my friend",
                "_childDocuments_": [
                    {
                        "id": "6",
                        "content_type_s": "doc",
                        "doc_type_ss": [
                            "APHP.CRH-S",
                            "LOINC.XX-X"
                        ],
                        "doc_text_t": "babar aime la r\u00e9b\u00e9llion"
                    },
                    {
                        "id": "7",
                        "content_type_s": "doc",
                        "doc_type_ss": [
                            "APHP.CRH-H",
                            "LOINC.XX-Y"
                        ],
                        "doc_text_t": "popeye aime la marche"
                    },
                    {
                        "id": "8",
                        "content_type_s": "acte",
                        "ccam_ss": [
                            "XX1",
                            "XX2",
                            "XX3"
                        ],
                        "cim10_ss": [
                            "YY1",
                            "YY2",
                            "YY3"
                        ]
                    },
                    {
                        "id": "9",
                        "content_type_s": "diag",
                        "cim10_ss": [
                            "YY1",
                            "YY2",
                            "YY3"
                        ]
                    }
                ]
            },
            {
                "id": "3",
                "patient_id": "1",
                "content_type_s": "visite",
                "nda_s": "224872811",
                "ccam_ss": [
                    "XX4"
                ],
                "cim10_ss": [
                    "YY4"
                ]
            }
        ]
    },
    {
        "id": "4",
        "age_i": 31,
        "content_type_s": "patient",
        "labo_txt": ["AAX/000018","BBX/000019"],
        "crh.section1_txt": ["mon patient a ceci et cela"],
        "crh.section2_txt": ["ma patiente n'a pas cela", "elle a une banane"],
        "projects_ss": [
            "serv1",
            "gh1",
            "hop1"
        ],
        "_childDocuments_": [
            {
                "id": "5",
                "content_type_s": "visite",
                "nda_s": "224872812",
                "ccam_ss": [
                    "XX5"
                ],
                "cim10_ss": [
                    "YY5"
                ]
            }
        ]
    }
]

It yet possible to ask relevant questions to SolR:

# Count and give me one patient having during the same visit:
# - one XX1 code
# - one report containing "popeye"
curl http://localhost:8983/solr/patient/query -d '
q=*:*
&fq={!parent which="content_type_s:visite"}
    ccam_ss:XX1
&fq={!parent which="content_type_s:visite"}
    {!complexphrase inOrder=true}
        doc_text_t:"popeye marche"~3
        AND doc_type_ss:APHP.CRH-H
&fl=id
&rows=1'

Using blockJoin queries has some limitations: - it is not possible to highlight matching text from a parent or a child document - it has some impact on performances

The particular model used in this post has some limitations: - it is not possible to highlight matching text at the patient level - it is not possible to play with a sequence of events, such of encounter - queries are complicated

Let's see a other approach in a next post.

Comments