先頭/末尾ではない文字列を表す正規表現

正規表現には肯定先読み (?=...) と肯定後読み (?<=...) と呼ばれる機能があり，先頭/末尾ではない文字列を検索するのに有用なので紹介します．

肯定先読みや肯定後読みを文章で説明するよりも，Pythonのunittestとしてソースコードで表現する方が分かりやすいと思いますので，以下をご覧ください．

import re
import unittest
from typing import List, Pattern


class PositiveLookaheadAndLookbehindTest(unittest.TestCase):
    @classmethod
    def _matches(cls, query: str, pattern: Pattern) -> List[str]:
        return [u.group() for u in pattern.finditer(query)]

    def test_start_of_string(self):
        pattern = re.compile(r"^Apple")
        tests = [
            ("Apple", ["Apple"]),
            ("This is an Apple product.", []),
        ]
        for i, o in tests:
            with self.subTest(i=i):
                self.assertListEqual(self._matches(i, pattern), o)

    def test_positive_lookbehind(self):
        # 肯定後読み
        pattern = re.compile(r"(?<=.)Apple")
        tests = [
            ("Apple", []),
            ("This is an Apple product.", ["Apple"]),
        ]
        for i, o in tests:
            with self.subTest(i=i):
                self.assertListEqual(self._matches(i, pattern), o)

    def test_end_of_string(self):
        pattern = re.compile(r"Apple$")
        tests = [
            ("Apple", ["Apple"]),
            ("This is an Apple product.", []),
        ]
        for i, o in tests:
            with self.subTest(i=i):
                self.assertListEqual(self._matches(i, pattern), o)

    def test_positive_hookaread(self):
        # 肯定先読み
        pattern = re.compile(r"Apple(?=.)")
        tests = [
            ("Apple", []),
            ("This is an Apple product.", ["Apple"]),
        ]
        for i, o in tests:
            with self.subTest(i=i):
                self.assertListEqual(self._matches(i, pattern), o)

MacOS 10.15.7の Python 3.8.10 で実行し，動作確認しました．

....
----------------------------------------------------------------------
Ran 4 tests in 0.001s

OK

_matches()関数は，正規表現がマッチした部分の配列を返す関数です．

先頭の文字列は，^を，末尾の文字列は，$を用いればよいことが，test_start_of_string(), test_end_of_string()より分かります．

先頭でない文字列は，(?<=.)のように，直前に任意の1文字が存在するかどうかを判定することで表現可能なことが，test_positive_lookbehind()より分かります．

末尾でない文字列は，(?=.)のように，直後に任意の1文字が存在するかどうかを判定することで表現可能なことが，test_positive_hookaread()より分かります．

みーのぺーじ

みーが趣味でやっているPCやソフトウェアについて．Python, Javascript, Processing, Unityなど．

先頭/末尾ではない文字列を表す正規表現