Sap Abap Upload Csv File Into Internal Table

I'd like to present a different approach using notice heavily, compared to your line based approach this seems to have equivalent performance for unquoted fields but performs slightly better if quoted fields are present:

performance table

In full general, this uses the pattern position = discover( off = position + ane ) to iterate over the string in chunks, and then uses substring to copy ranges into strings. What can be observed here is that in a loop that iterates a million times, every nanosecond saved has an touch on the performance, and by moving as much of information technology out of the inner loop 1 can increase functioning significantly. For the "uncomplicated" case of 10 digit fields one can come across that both algorithms perform every bit well, however for "longer" 30 digit fields your algorithm is getting faster in comparing. For fields with quotes the scan & concat approach I've used seems to exist faster than the "reconstruct" arroyo. I guess although i can achieve small gains through more clever ABAP, further pregnant optimizations are but possible by utilizing the engine even more.

Anyways, Here's the algorithm:

          CLASS lcl_csv_parser_find IMPLEMENTATION.   METHOD parse.     DATA line TYPE string_table.     Data position Type i.     Information(string_length) = strlen( i_string ).      " Dereferencing member fields is slightly slower than variable access, in a close loop this matters     DATA(separators) = me->separators.     Information(delimiter)  = me->delimiter.      Check string_length <> 0.      " Checking for delimiters in the Practise loop is quite slow.      " By scanning the whole file once and skipping that check if no delimiter is present     " This lead to a slight performance increment of 1s for 1 meg rows     Data(next_delimiter) = find( val = i_string sub = delimiter ).      DO.       DATA(start_position) = position.       Information(field) = ``.       " Check if field is enclosed in double quotes, equally we need to unescape and then       IF next_delimiter <> -1 AND i_string+position(ane) = delimiter.          start_position = start_position + 1. " literal starts after opening quote           Practise.             position = find( val = i_string off = position + 1 sub = delimiter ).             " literal must be closed             " Affirm position <> -ane.              Information(subliteral_length) = position - start_position.             field = field && substring( val = i_string off = start_position len = subliteral_length ).              DATA(following_position) = position + i.             IF position = string_length OR i_string+following_position(1) <> delimiter.               " End of literal is reached               position = position + i. " skip closing quote               Leave. " Do             ELSE.               " Found escape quote instead               position = following_position + i.               field = field && me->delimiter.               " continue searching             ENDIF.              " ASSERT sy-index < 1000.          ENDDO.       ELSE.         " Unescaped field, only find the ending comma or newline         position = find_any_of( val = i_string off = position + 1 sub = separators ).          IF position = -1.           position = string_length.         ENDIF.          field = substring( val = i_string off = start_position len = position - start_position ).       ENDIF.        APPEND field TO line.         " Check if line concluded and new line is started       Data(electric current) = substring( val = i_string off = position len = 2 ).       IF current = me->line_separator.        Suspend line TO r_result.        CLEAR line.        position = position + 2. " skip newline       ELSE.         " Assert i_string+position(i) = me->separator.         position = position + 1.       ENDIF.         " Check if file ended       IF position >= string_length.         RETURN.       ENDIF.        " Assert sy-index < 100000001.     ENDDO.    ENDMETHOD. ENDCLASS.                  

As a sidenote, instead of creating a huge table of string fields as stated in #1, I would experiment with some kind of "visitor pattern", due east.k. pass an instance of such an interface to the parser:

          INTERFACE if_csv_visitor.   METHODS begin_line.   METHODS end_line.   METHODS visit_field     IMPORTING       i_field TYPE string. ENDINTERFACE.                  

As in a lot of cases you lot'll write the CSV fields into a structure anyways, and thus one can save allocating this quite big table.


And for farther reference, here'due south the whole report:

          *&---------------------------------------------------------------------* *& Study Z_CSV *&---------------------------------------------------------------------* *& *&---------------------------------------------------------------------* REPORT Z_CSV.  * --------------------- Generic CSV Parser ----------------------------*  CLASS lcl_csv_parser DEFINITION ABSTRACT.    PUBLIC SECTION.     TYPES:       t_string_matrix TYPE STANDARD Table OF string_table WITH EMPTY Primal.      METHODS:       parse Abstruse         IMPORTING           i_string       Blazon string         RETURNING           VALUE(r_result) TYPE t_string_matrix,       constructor         IMPORTING           i_delimiter      TYPE string DEFAULT  '"'           i_separator      Blazon string DEFAULT  ','           i_line_separator Blazon abap_cr_lf DEFAULT cl_abap_char_utilities=>cr_lf.    PROTECTED SECTION.     Data:       delimiter         Type string,       separator         TYPE string,       line_separator    TYPE string,       escaped_delimiter TYPE string,       separators        Type string.  ENDCLASS.  Class lcl_csv_parser IMPLEMENTATION.   METHOD constructor.     me->delimiter = i_delimiter.     me->separator = i_separator.     me->line_separator = i_line_separator.     me->escaped_delimiter = |{ i_delimiter }{ i_delimiter }|.     me->separators = i_separator && i_line_separator.   ENDMETHOD. ENDCLASS.   * --------------------------- Line based CSV Parser ------------------------ *  CLASS lcl_csv_parser_line DEFINITION INHERITING FROM lcl_csv_parser.   PUBLIC SECTION.     METHODS parse REDEFINITION.    Private SECTION.     METHODS parse_line_to_string_table       IMPORTING         i_line         Type string       RETURNING         VALUE(r_result) TYPE string_table. ENDCLASS.   Class lcl_csv_parser_line IMPLEMENTATION.   METHOD parse.     "go the lines     Dissever i_string AT me->line_separator INTO Tabular array Information(lines).     Data open_line TYPE abap_bool VALUE abap_false.     Information current_line Type cord.      LOOP AT lines ASSIGNING FIELD-SYMBOL(<line>).        Discover ALL OCCURRENCES OF me->delimiter IN <line> IN Graphic symbol MODE Friction match COUNT DATA(count).       IF ( count MOD 2 )  = 1.         IF open_line = abap_true.           current_line = |{ current_line }{ me->line_separator }{ <line> }|.           open_line = abap_false.           APPEND parse_line_to_string_table( current_line ) TO r_result.         ELSE.           current_line = <line>.           open_line = abap_true.         ENDIF.       ELSE.         IF open_line = abap_true.           current_line = |{ current_line }{ me->line_separator }{ <line> }|.         ELSE.           Append parse_line_to_string_table( <line> ) TO r_result.         ENDIF.        ENDIF.     ENDLOOP.    ENDMETHOD.     METHOD parse_line_to_string_table.     Dissever i_line AT me->separator INTO TABLE Information(fields).      DATA open_field TYPE abap_bool VALUE abap_false.     Information current_field TYPE string.      LOOP AT fields ASSIGNING FIELD-SYMBOL(<field>).       Observe ALL OCCURRENCES OF me->delimiter IN <field> IN CHARACTER MODE Match COUNT DATA(count).       IF ( count MOD 2 ) = ane.         IF open_field = abap_true.           current_field = |{ current_field }{ me->separator }{ <field> }|.           open_field = abap_false.           APPEND current_field TO r_result.         ELSE.           current_field = <field>.           open_field = abap_true.         ENDIF.       ELSE.         IF open_field = abap_true.           current_field = |{ current_field }{ me->separator }{ <field> }|.         ELSE.           Append <field> TO r_result.         ENDIF.       ENDIF.      ENDLOOP.      REPLACE ALL OCCURRENCES OF me->escaped_delimiter IN Table r_result WITH me->delimiter.    ENDMETHOD.  ENDCLASS.  *--------------- Find based CSV Parser ------------------------------------*  Grade lcl_csv_parser_find DEFINITION INHERITING FROM lcl_csv_parser.   PUBLIC SECTION.     METHODS parse REDEFINITION.  ENDCLASS.  CLASS lcl_csv_parser_find IMPLEMENTATION.   METHOD parse.     Data line Type string_table.     Data position Blazon i.     DATA(string_length) = strlen( i_string ).      " Dereferencing member fields is slightly slower than variable admission, in a close loop this matters     DATA(separators) = me->separators.     DATA(delimiter)  = me->delimiter.      CHECK string_length <> 0.      " Checking for delimiters in the Exercise loop is quite slow.     " Past scanning the whole file in one case and skipping that cheque if no delimiter is nowadays     " This lead to a slight operation increment of 1s for one million rows     Information(next_delimiter) = observe( val = i_string sub = delimiter ).      DO.       Information(start_position) = position.       DATA(field) = ``.       " Check if field is enclosed in double quotes, equally we demand to unescape and so       IF next_delimiter <> -1 AND i_string+position(i) = delimiter.          start_position = start_position + one. " literal starts after opening quote           DO.             position = detect( val = i_string off = position + 1 sub = delimiter ).             " literal must exist closed             " ASSERT position <> -ane.              Data(subliteral_length) = position - start_position.             field = field && substring( val = i_string off = start_position len = subliteral_length ).              Information(following_position) = position + 1.             IF position = string_length OR i_string+following_position(i) <> delimiter.               " End of literal is reached               position = position + ane. " skip closing quote               EXIT. " Do             ELSE.               " Found escape quote instead               position = following_position + 1.               field = field && me->delimiter.               " continue searching             ENDIF.              " ASSERT sy-index < 1000.          ENDDO.       ELSE.         " Unescaped field, merely observe the ending comma or newline         position = find_any_of( val = i_string off = position + 1 sub = separators ).          IF position = -1.           position = string_length.         ENDIF.          field = substring( val = i_string off = start_position len = position - start_position ).       ENDIF.        Suspend field TO line.         " Bank check if line ended and new line is started       DATA(electric current) = substring( val = i_string off = position len = ii ).       IF current = me->line_separator.        Append line TO r_result.        Clear line.        position = position + 2. " skip newline       ELSE.         " Assert i_string+position(i) = me->separator.         position = position + one.       ENDIF.         " Bank check if file ended       IF position >= string_length.         RETURN.       ENDIF.        " Affirm sy-index < 100000001.     ENDDO.    ENDMETHOD. ENDCLASS.  * -------------------- Tests -------------------------------------------------------- *  Class lcl_test_csv_parser DEFINITION   Final   CREATE PUBLIC .    PUBLIC Section.     CLASS-METHODS run.     Course-METHODS get_file_complex       RETURNING VALUE(r_result) Type cord.     CLASS-METHODS get_file_simple       RETURNING VALUE(r_result) TYPE string.     CLASS-METHODS get_file_long       RETURNING VALUE(r_result) TYPE cord.     CLASS-METHODS get_file_longer       RETURNING VALUE(r_result) TYPE string.     Form-METHODS get_file_mixed       RETURNING VALUE(r_result) Blazon cord.      PROTECTED Department.   Individual SECTION.  ENDCLASS.    CLASS lcl_test_csv_parser IMPLEMENTATION.    METHOD get_file_complex.     DATA(file_line) =       echo( val = |"1234,{ cl_abap_char_utilities=>cr_lf }7890",| occ = 9 ) &&       |"1234,{ cl_abap_char_utilities=>cr_lf }7890"| &&       cl_abap_char_utilities=>cr_lf.      r_result = echo( val = file_line occ = 1000000 ).   ENDMETHOD.    METHOD get_file_simple.     Information(file_line) =       repeat( val = |1234567890,| occ = 9 ) &&       |1234567890| &&       cl_abap_char_utilities=>cr_lf.      r_result = repeat( val = file_line occ = 1000000 ).   ENDMETHOD.    METHOD get_file_long.     DATA(file_line) =       echo( val = |12345678901234567890,| occ = 4 ) &&       |12345678901234567890| &&       cl_abap_char_utilities=>cr_lf.      r_result = repeat( val = file_line occ = 1000000 ).   ENDMETHOD.    METHOD get_file_longer.     Data(file_line) =       repeat( val = |1234567890123456789012345678901234567890,| occ = 2 ) &&       |1234567890123456789012345678901234567890| &&       cl_abap_char_utilities=>cr_lf.      r_result = echo( val = file_line occ = meg ).   ENDMETHOD.     METHOD get_file_mixed.     Data(file_line) =       |1234567890,1234567890,"1234,{ cl_abap_char_utilities=>cr_lf }7890",1234567890,1234567890,1234567890,"1234,{ cl_abap_char_utilities=>cr_lf }7890",1234567890,1234567890,1234567890| &&       cl_abap_char_utilities=>cr_lf.      r_result = repeat( val = file_line occ = 1000000 ).   ENDMETHOD.      METHOD run.     Data prepare_start TYPE timestampl.     Get Fourth dimension STAMP FIELD prepare_start.      TYPES:       BEGIN OF t_file,         name    Type string,         content TYPE string,       Finish OF t_file,       t_files TYPE STANDARD Tabular array OF t_file WITH EMPTY Central.     Data(files) = VALUE t_files(      ( name = `simple`  content = get_file_simple( )  )      ( name = `long`    content = get_file_long( )    )      ( name = `longer`  content = get_file_long( )    )      ( name = `complex` content = get_file_complex( ) )      ( name = `mixed`   content = get_file_mixed( )   )     ).      DATA prepare_end TYPE timestampl.     Become TIME STAMP FIELD prepare_end.     WRITE |Preparation took { cl_abap_tstmp=>subtract( tstmp1 = prepare_end tstmp2 = prepare_start ) }|. SKIP 2.      WRITE: 'File', fifteen 'Line Parse', 30 'Detect Parse', 45 'Match'. NEW-LINE.     ULINE.      LOOP AT files INTO Data(file).        WRITE file-proper noun Under 'File'.       Data line_start Type timestampl.       GET TIME Stamp FIELD line_start.        DATA(line_parser) = NEW lcl_csv_parser_line(  ).       DATA(line_result) = line_parser->parse( file-content ).        DATA line_end Blazon timestampl.       GET Fourth dimension Postage FIELD line_end.       WRITE |{ cl_abap_tstmp=>decrease( tstmp1 = line_end tstmp2 = line_start ) }s| Under 'Line Parse'.         Information find_start Blazon timestampl.       GET Time STAMP FIELD find_start.        DATA(find_parser) = NEW lcl_csv_parser_find(  ).       DATA(find_result) = find_parser->parse( file-content ).        Data find_end Blazon timestampl.       Get TIME STAMP FIELD find_end.       WRITE |{ cl_abap_tstmp=>subtract( tstmp1 = find_end tstmp2 = find_start ) }due south| UNDER 'Find Parse'.        " WRITE COND #( WHEN line_result = find_result THEN 'yep' ELSE 'no') Nether 'Match'.       NEW-LINE.     ENDLOOP.   ENDMETHOD.    ENDCLASS.  START-OF-SELECTION.   lcl_test_csv_parser=>run( ).                  

bellfarly1953.blogspot.com

Source: https://stackoverflow.com/questions/68840370/how-to-parse-csv-file-in-the-most-performant-way

0 Response to "Sap Abap Upload Csv File Into Internal Table"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel