c#：如何读取文件的部分内容？（DICOM）

Question

我想用 C# 读取 DICOM 文件。我不想做任何花哨的事情，我只是现在想知道如何读取元素，但首先我实际上想知道如何读取标题以查看是否是有效的 DICOM 文件。

它由二进制数据元素组成。前 128 个字节未使用（设置为零），后跟字符串“DICM”。接下来是标题信息，这些信息被组织成组。

DICOM 标头示例

前 128 个字节：未使用的 DICOM 格式。
随后是字符“D”、“I”、“C”、“M”
接下来是额外的标头信息，例如：

0002,0000，文件元元素组长度：132
0002,0001，文件元信息版本：256
0002,0010，传输语法 UID：1.2.840.10008.1.2.1。
0008,0000，识别组长度：152
0008,0060，方式：MR
0008,0070，制造商：MRIcro

在上面的示例中，标题被组织成组。组 0002 hex 是文件元信息组，包含 3 个元素：一个定义组长度，一个存储文件版本，一个存储传输语法。

问题

如何读取头文件并通过检查 128 字节前导码后的“D”、“I”、“C”、“M”字符来验证它是否是 DICOM 文件？
如何继续解析文件读取其他部分的数据？

Answer 1

这样的事情应该读取文件，它是基本的并且不能处理所有情况，但这将是一个起点：


public void ReadFile(string filename)
{
    using (FileStream fs = File.OpenRead(filename))
    {
        fs.Seek(128, SeekOrigin.Begin);
        if ((fs.ReadByte() != (byte)'D' ||
             fs.ReadByte() != (byte)'I' ||
             fs.ReadByte() != (byte)'C' ||
             fs.ReadByte() != (byte)'M'))
        {
            Console.WriteLine("Not a DCM");
            return;
        }
        BinaryReader reader = new BinaryReader(fs);

        ushort g;
        ushort e;
        do
        {
            g = reader.ReadUInt16();
            e = reader.ReadUInt16();

            string vr = new string(reader.ReadChars(2));
            long length;
            if (vr.Equals("AE") || vr.Equals("AS") || vr.Equals("AT")
                || vr.Equals("CS") || vr.Equals("DA") || vr.Equals("DS")
                || vr.Equals("DT") || vr.Equals("FL") || vr.Equals("FD")
                || vr.Equals("IS") || vr.Equals("LO") || vr.Equals("PN")
                || vr.Equals("SH") || vr.Equals("SL") || vr.Equals("SS")
                || vr.Equals("ST") || vr.Equals("TM") || vr.Equals("UI")
                || vr.Equals("UL") || vr.Equals("US"))
               length = reader.ReadUInt16();
            else
            {
                // Read the reserved byte
                reader.ReadUInt16();
                length = reader.ReadUInt32();
            }

            byte[] val = reader.ReadBytes((int) length);

        } while (g == 2);

        fs.Close();
    }

    return ;
}

该代码实际上并没有尝试考虑到编码数据的传输语法可能会在第 2 组元素之后发生更改，它也没有尝试对读入的实际值执行任何操作。

Answer 2

只是一些伪逻辑

如何读取头文件并通过检查 128 字节前导码后的“D”、“I”、“C”、“M”字符来验证它是否是 DICOM 文件？

使用 File.OpenRead 作为二进制文件打开
寻找位置 128 并将 4 个字节读入数组，并将其与 DICM 的 byte[] 值进行比较。您可以使用 ASCIIEncoding.GetBytes() 来实现

如何继续解析文件读取其他部分的数据？

使用之前拥有的 FileStream 对象句柄使用 Read 或 ReadByte 继续读取文件
使用与上述相同的方法进行比较。

不要忘记关闭并处理文件。

Answer 3

你也可以这样使用。

FileStream fs = File.OpenRead(path);

byte[] data = new byte[132];
fs.Read(data, 0, data.Length);

int b0 = data[0] & 255, b1 = data[1] & 255, b2 = data[2] & 255, b3 = data[3] & 255;

if (data[128] == 68 && data[129] == 73 && data[130] == 67 && data[131] == 77)
        {
           //dicom file
        }
        else if ((b0 == 8 || b0 == 2) && b1 == 0 && b3 == 0)
        {
            //dicom file
        }

Answer 4

取自 Evil Dicom 库中的 EvilDicom.Helper.DicomReader：

 public static bool IsValidDicom(BinaryReader r)
    {
        try
        {
            //128 null bytes
            byte[] nullBytes = new byte[128];
            r.Read(nullBytes, 0, 128);
            foreach (byte b in nullBytes)
            {
                if (b != 0x00)
                {
                    //Not valid
                    Console.WriteLine("Missing 128 null bit preamble. Not a valid DICOM file!");
                    return false;
                }
            }
        }
        catch (Exception)
        {

            Console.WriteLine("Could not read 128 null bit preamble. Perhaps file is too short");
            return false;
        }

        try
        {
            //4 DICM characters
            char[] dicm = new char[4];
            r.Read(dicm, 0, 4);
            if (dicm[0] != 'D' || dicm[1] != 'I' || dicm[2] != 'C' || dicm[3] != 'M')
            {
                //Not valid
                Console.WriteLine("Missing characters D I C M in bits 128-131. Not a valid DICOM file!");
                return false;
            }
            return true;

        }
        catch (Exception)
        {

            Console.WriteLine("Could not read DICM letters in bits 128-131.");
            return false;
        }

    }

Answer 5

此代码将读取最重要的 dicom 标签并将其存储在 json 和 excel 中。它将使用文件夹名称、dicom 研究 ID 和日期来识别给定目录中的研究。它将为每个研究创建一个字典，其中包含最有用的信息，以及 dicom 系列的字典（每个 dicom 研究都有很多 dicom 图像，每个都属于系列）。我的代码在这里这是带有描述的完整代码。哦，您可以添加或删除您想要的信息。

#FINAL 20231216
#My context: I coded this on my windows11 with RTC3080Ti and Corei9-12gen and 32G Ram. I am coding on VS code and using jupyter notebook.
#Your requirment: It doesn't need any exceptional hardward you can run it on an average pc/labtob

import pydicom as pm #for reading dicoms
import os #for looping through system direcotries
from pydicom.multival import MultiValue #for reading dicom metadata
from pydicom.valuerep import PersonName #since tunring dictionary to json raised an error you should use this
from tqdm.notebook import tqdm #for that fancy loop progress, I like it though
import pandas as pd #for tunring dic to excel, first we trasnform it to pandas dataframe
import json #for storing as json

from IPython.display import HTML #so you can click on the sotred excel and json and open it from jupyter notebook

def get_dicom_tag_value(dicom_file, tag, default=None):
    '''this function will get the dicom tag from the dicom filde for the given tag/code'''
    tag_value = dicom_file.get(tag, None)
    if tag_value is None:
        return default
    if isinstance(tag_value, MultiValue):
        return list(tag_value)  # Convert MultiValue to list
    return tag_value.value

def get_path_to_first_subfolder(full_path, first_subfolder):
    """this will get the path to the first folder of root, which is the subfolder that contains all dicom filed of one dicom study """
    path_parts = full_path.split(os.sep)
    if first_subfolder in path_parts:
        subfolder_index = path_parts.index(first_subfolder)
        return os.sep.join(path_parts[:subfolder_index + 1])
    else:
        return full_path

def count_subfolders(directory):
    '''this will cont the number of files and folders within a direcotyr'''
    total_subfolders = 0
    total_files=0
    for root, dirs, files in os.walk(directory):
        total_subfolders += len(dirs)
        total_files += len(files)
    return total_subfolders,total_files 


class CustomJSONEncoder(json.JSONEncoder): #this class will turn our multilevel dictionary into a json file
    def default(self, obj):
        if isinstance(obj, MultiValue):
            return list(obj)  # Convert MultiValue to list
        elif isinstance(obj, PersonName):
            return str(obj)   # Convert PersonName to string
        return json.JSONEncoder.default(self, obj)

def ensure_json_extension(directory): 
    '''this function will ensure that definied json direcotry contains the required extension, otherwise, it will add this to the end of definied dir'''
    if not directory.endswith(".json"):
        return directory + "\\JSON.json"
    return directory

def ensure_excel_extension(directory):
    '''this function will ensure that definied excel direcotry contains the required extension, otherwise, it will add this to the end of definied dir'''
    if not directory.endswith(".xlsx"):
        return directory + "\\excel.xlsx"
    return directory

def create_clickable_dir_path(dir_path):
    # Convert the directory path to a file URL
    file_url = f"{dir_path}"
    return HTML(f'<a href="{file_url}" target="_blank">{dir_path}</a>')



def get_dicomdir_give_dicomdicom_datadic(dicom_dir, #direcotry that you want to read, usually dicom studies should be in one folder, preferably with patient unique id/name
                                     dicom_validation=True, #this will check wether the file in the loop is dicom or not. Although make it slower, I recommend using it to ensure only dicom files go through loop 
                                     folder_list_name_indicomdir=None, #In your dicom_dir you can include list of folders name that you want to read. It will not read other folders. Kepp in mind that this will look into subfolders in the main folder, and not the subfolders of subfolders :)
                                     store_as_json_dir=None, #if you want to store your ditionary as json, give your desired json direcotry
                                     store_as_excel_dir=None #if you want to store your ditionary as excel, give your desired excel direcotry
                                     ):
    """
    This function creates a multi-level dictionary for DICOM meta data (named dicom_data) in a directory (named dicom_dir).
    The top level has the last component of dicom_dir, which is the first level subfolder, as a key.
    For each subforled it will store study data within this dic, along with another dicitonary for series data, within this study dictionary.
    For series dictionary the data corresponding for series number will be stored.
    We also have another private_info dictionary within subfodler dictionary.
    
    - dicom_validation: If you set dicom_validation=True, it will validate the file in the loop for being an dicom file. This is super important although it makes code slower.
    Becaouse, sometimes some dicom files have no extension, and also reading other files may cause error in the loop.
    
    - folder_list_name_indicomdir: #In your dicom_dir you can include list of folders name that you want to read. It will not read other folders. Kepp in mind that this will look into subfolders in the main folder, and not the subfolders of subfolders :)
    
    - store_as_json_dir: if you want to store your ditionary as json, give your desired json direcotry
    
    - store_as_excel_dir: if you want to store your ditionary as excel, give your desired excel direcotry
    
    For using this function, the best practice is to place each folder containing one dicom study in subfolder, under the dicom_dir. 
    However, you can change finding unique dicom studies, even placed next to each other beacouse I definied the study_unique=f'{first_subfolder}_{study_id}_{study_date}'.
    If you want your code to be faster you can chane the study_unique to study_unique=first_subfolder. It makes your code 15% faster, sometimes at the cost of incurrect retrival.
    
    """

    total_subfolder,total_files=count_subfolders(dicom_dir)
    print(f'your direcotry contains {total_subfolder} folders and {total_files} files')
    
    last_dir_name = os.path.basename(os.path.normpath(dicom_dir))
    dicom_data = {last_dir_name: {}}

    for root, dirs, files in tqdm(os.walk(dicom_dir), desc="Processing directories", total=total_subfolder,unit='folder'):
        if folder_list_name_indicomdir:
            split_path = root.replace(dicom_dir, '').split(os.sep)
            first_subfolder = split_path[1] if len(split_path) > 1 else ""
            if first_subfolder not in folder_list_name_indicomdir:
                print(f"""The folder {first_subfolder} was not in your definied list.""")
                continue  # Skip if the first subfolder is not in the user-defined list
            
        for file in files:
            if dicom_validation and not pm.misc.is_dicom(os.path.join(root, file)):
                continue # Skip if the it is not dicom file
                   

            try:
                dicom_file = pm.dcmread(os.path.join(root, file))
                study_id = get_dicom_tag_value(dicom_file, (0x0020, 0x0010))
                dicom_data_number = get_dicom_tag_value(dicom_file, (0x0020, 0x0011))
                study_date = get_dicom_tag_value(dicom_file, (0x0008, 0x0020))
                split_path = root.replace(dicom_dir, '').split(os.sep)
                first_subfolder = split_path[1] if len(split_path) > 1 else ""
                if study_id and dicom_data_number and study_date:
                    study_unique = f'{first_subfolder}_{study_id}_{study_date}' #you can change it for increasing the speed > study_unique=first_subfolder
                    if study_unique not in dicom_data[last_dir_name]:
                        private_info={'name': get_dicom_tag_value(dicom_file, (0x0010, 0x0010)),
                                      'institute': get_dicom_tag_value(dicom_file, (0x0008, 0x0080)),
                                      'patient_id': get_dicom_tag_value(dicom_file, (0x0010, 0x0020)),
                                      'accession_number':get_dicom_tag_value(dicom_file, (0x0008, 0x0050))
                                      }
                        
                        dicom_data[last_dir_name][study_unique] = {
                            'dir_to_root': get_path_to_first_subfolder(root, first_subfolder),
                            'study_description': get_dicom_tag_value(dicom_file, (0x0008, 0x1030)),
                            'date': study_date,
                            'age': get_dicom_tag_value(dicom_file, (0x0010, 0x1010)),
                            'sex': get_dicom_tag_value(dicom_file, (0x0010, 0x0040)),
                            'manufacture_model': get_dicom_tag_value(dicom_file, (0x0008, 0x1090)),
                            'manufacture_brand': get_dicom_tag_value(dicom_file, (0x0008, 0x0070)),
                            'manufacture_brand': get_dicom_tag_value(dicom_file, (0x0008, 0x0070)),
                            'protocol': get_dicom_tag_value(dicom_file, (0x0018, 0x1030)),
                            'study_id': study_id,
                            'patient_weight': get_dicom_tag_value(dicom_file, (0x0010, 0x1030)),
                            'Image_type': get_dicom_tag_value(dicom_file, (0x0008, 0x0008)),
                            'body_part': get_dicom_tag_value(dicom_file, (0x0018, 0x0015)),
                            'modalitty':get_dicom_tag_value(dicom_file, (0x0008, 0x0050)),
                            'private_info':private_info,
                            'image_dicom_data_list': {}
                        }

                    

                    dicom_data_info = {
                        'dicom_data_description': get_dicom_tag_value(dicom_file, (0x0008, 0x103E)),
                        'body_part': get_dicom_tag_value(dicom_file, (0x0018, 0x0015)),
                        'slice_thickness': get_dicom_tag_value(dicom_file, (0x0018, 0x0050)),
                        'Image_comment': get_dicom_tag_value(dicom_file, (0x0020, 0x4000)),
                        'kvp': get_dicom_tag_value(dicom_file, (0x0018, 0x0060)),
                        'exposure': get_dicom_tag_value(dicom_file, (0x0018, 0x1152)),
                        'exposure_time': get_dicom_tag_value(dicom_file, (0x0018, 0x1150)),
                    }
                    dicom_data[last_dir_name][study_unique]['image_dicom_data_list'][dicom_data_number] = dicom_data_info

            except Exception as e:
                print(f"""Error reading for {file}::: {e} \n """)
                continue
            
    if store_as_json_dir is not None:
        try:
            json_read = json.dumps(dicom_data, indent=4, cls=CustomJSONEncoder)
            store_as_json_dir=str(store_as_json_dir)
            store_as_json_dir=ensure_json_extension(store_as_json_dir)
            with open(store_as_json_dir, 'w') as json_file:
                json_file.write(json_read)
            print(f"""Json stored at :::""")
            display(create_clickable_dir_path(store_as_json_dir))         
        except:
            print(f"""Error storing the json ::: {e} \n """)
            
    if store_as_excel_dir is not None:
        try:
            dataframes = []
            for key, value in dicom_data.items():
                # Convert value to DataFrame if necessary
                df = pd.DataFrame(value)
                # Add the key as a new column or as part of the index
                df['Key'] = key  # Add key as a column
                # df = df.set_index(['Key'], append=True)  # Add key as part of a MultiIndex
                dataframes.append(df)

            # Concatenate all dataframes
            df2 = pd.concat(dataframes).T
            store_as_excel_dir=str(store_as_excel_dir)
            store_as_excel_dir=ensure_excel_extension(store_as_excel_dir)
            df2.to_excel(store_as_excel_dir)
            print(f"""Excel stored at :::""")
            display(create_clickable_dir_path(store_as_excel_dir))          
        except:
            print(f"""Error storing the excel ::: {e} \n """)
            
                                 
    return dicom_data


#example of running code
dicom_dir=r"F:\Data\Big Pancreas (CT, EUS)\Raw Data Hospital\Dr Radmard\Valid Case" 
save_dir_json=r'F:\Data\Big Pancreas (CT, EUS)\Raw Data Hospital\Radmard_all_dcm.json'
save_dir_xlsx=r'F:\Data\Big Pancreas (CT, EUS)\Raw Data Hospital\Radmard_all_dcm.xlsx'


dicom_dic=get_dicomdir_give_dicomdicom_datadic(
    dicom_dir, #direcotry that you want to read, usually dicom studies should be in one folder, preferably with patient unique id/name
                                     dicom_validation=True, #this will check wether the file in the loop is dicom or not. Although make it slower, I recommend using it to ensure only dicom files go through loop 
                                     folder_list_name_indicomdir=None, #In your dicom_dir you can include list of folders name that you want to read. It will not read other folders. Kepp in mind that this will look into subfolders in the main folder, and not the subfolders of subfolders :)
                                     store_as_json_dir=save_dir_json, #if you want to store your ditionary as json, give your desired json direcotry
                                     store_as_excel_dir=save_dir_xlsx #if you want to store your ditionary as excel, give your desired excel direcotry
                                     )

c#：如何读取文件的部分内容？（DICOM）

问题描述投票：0回答：5

5个回答

最新问题

c#：如何读取文件的部分内容？ （DICOM）

问题描述 投票：0回答：5

5个回答

最新问题

c#：如何读取文件的部分内容？（DICOM）

问题描述投票：0回答：5