当前位置: 面试刷题>> DNA重复问题 (经典算法题500道)


题目描述补充

DNA重复问题: 给定一个DNA序列(由'A', 'T', 'C', 'G'四种字符组成的字符串),找出所有长度至少为10的重复子序列(不必连续),并返回这些重复子序列的集合(不考虑重复元素)。注意,子序列与子串不同,子序列是指从原序列中删除某些(或不删除)字符而不改变剩余字符顺序得到的序列。

示例

输入

"ATGCTATCGCGATGCATCGAT"

输出(假设存在符合条件的重复子序列):

["ATGC", "CGAT", "TGCAT"]

注意:上述输出仅为示例,实际输出取决于给定DNA序列中的重复子序列。

PHP 示例代码

function findRepeatedDnaSubsequences($s, $minLength = 10) {
    $set = [];
    $subsequences = [];
    
    // 生成所有可能的子序列
    for ($i = 0; $i < strlen($s); $i++) {
        $temp = '';
        for ($j = $i; $j < strlen($s); $j++) {
            $temp .= $s[$j];
            if (strlen($temp) >= $minLength) {
                $subsequences[$temp] = isset($subsequences[$temp]) ? $subsequences[$temp] + 1 : 1;
            }
        }
    }

    // 筛选出重复的子序列
    foreach ($subsequences as $sub => $count) {
        if ($count > 1) {
            $set[] = $sub;
        }
    }

    return $set;
}

// 测试
$dna = "ATGCTATCGCGATGCATCGAT";
$result = findRepeatedDnaSubsequences($dna);
print_r($result);

Python 示例代码

def find_repeated_dna_subsequences(s, min_length=10):
    from collections import defaultdict

    subsequences = defaultdict(int)
    for i in range(len(s)):
        for j in range(i, len(s)):
            subseq = s[i:j+1]
            if len(subseq) >= min_length:
                subsequences[subseq] += 1

    return [subseq for subseq, count in subsequences.items() if count > 1]

# 测试
dna = "ATGCTATCGCGATGCATCGAT"
result = find_repeated_dna_subsequences(dna)
print(result)

JavaScript 示例代码

function findRepeatedDnaSubsequences(s, minLength = 10) {
    const subsequences = {};

    for (let i = 0; i < s.length; i++) {
        let temp = '';
        for (let j = i; j < s.length; j++) {
            temp += s[j];
            if (temp.length >= minLength) {
                if (subsequences[temp]) {
                    subsequences[temp]++;
                } else {
                    subsequences[temp] = 1;
                }
            }
        }
    }

    return Object.keys(subsequences).filter(sub => subsequences[sub] > 1);
}

// 测试
const dna = "ATGCTATCGCGATGCATCGAT";
const result = findRepeatedDnaSubsequences(dna);
console.log(result);

码小课:在解决这类算法问题时,关键在于理解子序列与子串的区别,并有效地生成和统计所有可能的子序列。此外,优化算法以减少不必要的计算也非常重要,比如可以考虑使用动态规划或更高效的哈希方法来减少时间复杂度。在码小课网站上,你可以找到更多关于算法和数据结构的深入讲解和实战案例,帮助你更好地掌握编程技能。

推荐面试题